Part B: More Instructions

Lab 6 is required for Project 3B, and lectures 16-19, Discussion 9-10, and Homework 4-5 are highly recommended.

In this part, you will expand your CPU to support more instructions and pipelining.

Before starting this part, please run git pull starter main to download any changes we may have pushed. If you see an error about "unrelated histories" run git fetch starter main-fix && git merge starter/main-fix instead.

You can implement the instructions in any order you want. This spec is organized to build up your CPU with small groups of instructions at a time.

Note: Your circuit may not rely on floating (undefined) values. While tests may pass locally, it will violate design rule checks on Gradescope.

Task 5: I-type Instructions

Implement the following Base RV32I I-Type instructions according to the course reference card:

addi
andi
ori
xori
slli
srli
srai
slti

Task 5.1: Datapath

Recall that you already implemented addi in Part A. Other I-type instructions use the same datapath as addi, except that each I-type instruction needs the ALU to perform a different operation.

In Part A, we hard-coded the ALUSel input to the ALU subcircuit to be 0b0000 so that the ALU always performs the addition selection, but now you should change ALUSel input to the ALU subcircuit to use the value from the control logic subcircuit (which you'll implement in the next task).
Remember to also change the RegWEn input to the regfile subcircuit to use the value from the control logic subcircuit.

Task 5.2: Control Logic

As you add logic to support more instructions in the next few tasks, you will need to add control logic to enable the relevant datapath components depending on the instruction being executed.

Task: Modify control-logic.circ to output the correct control logic signals for I-type instructions.

The control logic subcircuit takes the instruction bits and outputs all the control signals needed to execute that instruction. Read the course notes for details.

Our suggested approach ("ROM Approach"):

Build the control logic truth table in a spreadsheet.
Copy values from the spreadsheet into the ROM subcircuit lookup table.
Wire a priority encoder to select the correct row of the ROM table.
Use combinational logic for PCSel.

You can also implement the control logic entirely with combinational logic. It is up to you.

ROM Subcircuit

Make a copy of this spreadsheet: [61C SP26] Project 3B ROM Control Logic.
For each instruction, fill in the spreadsheet cells under the "Control Signals" header. Do not modify any cells under other headers. See below.
Copy the data in the "ROM Output" column (not including the headers).
Open control-logic.circ in Logisim, and click on the ROM.
In the properties tab on the left-hand sidebar, click on "(click to edit)" next to the "Contents" label.
Click on the upper-left-most data cell (which will be a collection of 4 hex digits without a preceding 0x) and paste your values from earlier into the ROM. You should see the ROM control bits change to the values from your spreadsheet.
Click "Close Window" to exit the ROM programming view.

The below two columns of the spreadsheet are automatically populated for you:

"ROM Input": This is the value that will be passed into the ROM. This is the input passed into the ROM for each instruction.
"ROM Output": This is the value that will be outputted from the ROM. This output is all the control signals concatenated together.

For example, when passed in an addi instruction, your control logic needs to pass 15 (0b1111) into the ROM so the correct entry of control signals can be selected.

Expand the below for things to keep in mind when filling out the spreadsheet.

Spreadsheet Details

Enter binary digits without the leading 0b (type 01 instead of 0b01).
- If the upper-right corner of the control signal's cell turns red, or if you see a warning when you hover over the cell, your cell contents are invalid. Common sources of errors include entering a control signal that has too many or too few bits, or entering characters other than 0 or 1 in a control logic cell.
- If a control signal is "don't care" for a certain instruction, you can put any valid value in the corresponding cell.
- Make sure to fill in every cell in a given row. If a row has empty cells, the ROM output value is unreliable (garbage).
- You can use any encoding (which binary digits correspond to which control logic cases) for your control logic, as long as your encoding is consistent with your circuit. For example:
  - Suppose you choose to set ASel = 0 for instructions that input the data in RegReadData1 to the ALU, and ASel = 1 for instructions that input the PC to the ALU.
  - When you make a mux with ASel as the select bit, input 0 should be RegReadData1 and input 1 should be the PC.
  - You could also use the opposite encoding (ASel = 0 for PC and ASel = 1 for RegReadData1) and flip the mux inputs, and the circuit would still behave the same way.

ROM Input: Use Priority Encoder

Once you have the spreadsheet ready, you'll need to add wiring to select which row of the ROM table you want to use for a given instruction. In other words, you'll need a block that can take an instruction and output the bits of the corresponding "ROM Input" column fo the spreadsheet.

We suggest using a Priority Encoder block in Logisim. A Priority Encoder block has a number of inputs, zero-indexed. The component determines the indices of inputs whose values are high and emits the highest index. For example, if inputs 0, 2, 5, and 6 are all 1, then the priority encoder emits a value of 110. Read more.

Priority Encoder: Another example

Consider a toy example of an ALU that only performs 3 operations: add (ALUSel=0), mul (ALUSel=1), sub (2). We would like to use a Priority Encoder to determine the correct ALUSel signal from three tunnels: is_add, is_mul, and is_sub.

To generate the ALUSel control signal, we can connect the is_add, is_mul, and is_sub tunnels to their corresponding ALUSel position (indices 0, 1, 2 for add, mul, and sub, respectively) on the Priority Encoder. The Priority Encoder will return the position of the active input signal, which in this case is the correct ALUSel value.

Example of a priority encoder.

Note: In case there are multiple active signals at the same time, a Priority Encoder returns the largest active position. However, you should try to avoid having multiple active signals at once for this use case. After all, a single instruction can't require the ALU to perform both add and multiply at once.

Testing and Debugging

Tip: To determine what instruction is being inputted into the control logic subcircuit, it might help to use comparators and constants to compare bits of the instruction against some fixed constant value.

We don't have any provided tests for I-type instructions, so you'll need to write your own custom tests. Before requesting help from staff, please make sure you have some tests written, or we'll ask you to write some tests first before helping you.

Task 6: R-type Instructions

Implement the following Base RV32I R-Type instructions according to the course reference card:

add
sub
and
or
xor
sll
srl
sra
slt

Additionally, implement the mul, mulh, and mulhu general multiplication instructions (part of the RISC-V "M" extension). See the course notes for details.

Task 6.1: Datapath

Modify your datapath in cpu.circ so that it can support R-type instructions.

If you're stuck, read further for some guiding questions. As with Task 4, it may help to think about each of the five stages for executing an instruction.

Instruction Fetch: How do R-type instructions affect the program counter?

R-type instructions always increment the program counter by 4 to fetch the next instruction, just like the addi instruction from Part A. This means we don't need to modify the program counter implementation for this task.

Instruction Decode: What do we need to read from the register file?

R-type instructions require reading the values of two source registers (rs1 and rs2) from the register file. In Part A, you split the rs1 bits from the instruction and passed them to the regfile. Now, you should also split the rs2 bits from the instruction and pass them to the regfile.

Execute: What two data values (A and B) should an R-type instruction input to the ALU?

R-type instructions pass the register values from the regfile into the ALU. In Part A, you already passed the first register value RegReadData1 into the first input of the ALU. However, for the addi instruction, the second input of the ALU is an immediate. Since you want to support both R-type instructions and the addi instruction, you should use a multiplexer to select which input will be inputted to the ALU.

The select bit of this multiplexer is BSel. You will implement the logic for determining BSel from the instruction bits in the control logic later in this task.

Memory: Do R-type instructions write to memory?

R-type instructions do not write to memory (they write to a register on the CPU, which is different from memory). This means we don't need to modify DMEM for this task.

Write back: What data is the R-type instruction writing, and where is the instruction writing this data to?

R-type instructions take the result of the computation (from the ALU output) and write the result to the register rd. In Part A, you already implemented logic to write the ALU output into a destination register.

Task 6.2: Control Logic

Modify control-logic.circ to output the correct control logic signals for R-type instructions.

Testing and Debugging

You will also need to write custom tests for R-type instructions; we don't have any provided tests. Before requesting help from staff, please make sure you have some tests written, or we'll ask you to write some tests first before helping you.

Task 7: B-type Instructions

Implement the following Base RV32I I-Type instructions according to the course reference card:

beq
bge
bgeu
blt
bltu
bne

Task 7.1: Branch Comparator

Fill in the branch comparator subcircuit in branch-comp.circ. See the input/output signals of the Branch Comparator in the course notes.

We've provided some unit tests for the branch comparator subcircuit. These are not comprehensive. You can run these tests with bash test.sh test_branch_comp.

Task 7.2: Immediate Generator

Edit the immediate generator in imm-gen.circ so that it can generate immediates for B-type instructions in addition to immediates for I-type instructions (which you implemented in Part A).

See the input/output signals of the Immediate Generator in the course notes. You will need to use the ImmSel signal (which you will implement in the control logic) to determine which type of immediate this subcircuit should generate.

We recommend reviewing the immediate types in the course notes.

Testing:

We've provided some unit tests for the immediate generator subcircuit, called test_imm_gen. These are not comprehensive.
Note that if you only implement generating B-type immediates now, some tests in test_imm_gen for other immediate types will fail, but make sure that the imm-gen-b-type test passes.

[Optional] Customize ImmSel encodings

The ImmSel values in the table represent the default encoding (mapping of ImmSel values to immediate types). If you choose to use a different encoding:

Navigate to tests/unit-imm-gen.
Open imm-gen-encoding.csv.
Replace the numbers with your selected encoding (in decimal). For example, if you're using ImmSel = 0b110 to denote an I-type instruction, the second line should say I,6.
Run the unit tests with bash test.sh test_imm_gen.

Task 7.3: Datapath

Modify your datapath in cpu.circ so that it can support B-type instructions.

If you're stuck, read further for some guiding questions. As with Task 4, it may help to think about each of the five stages for executing an instruction.

Instruction Fetch: How do B-type instructions affect the program counter?

Recall that branching instructions add an immediate to the current value of PC. If the branch is taken, the PC changes to be the result of this addition. If the branch is not taken, or the instruction is not an B-type instruction, then PC changes to PC+4 (just like in the previous tasks). We will implement this in the write-back stage.

Instruction Decode: What do we need to read from the register file?

B-type instructions have two source registers, rs1 and rs2, that we need to read from the register file. In the previous task, you already implemented reading the values in rs1 and rs2 for R-type instructions.

Execute: What two data values (A and B) should an B-type instruction input to the ALU?

B-type instructions use the ALU to add an immediate to PC. You will need to add a multiplexer so that the ALU can receive either PC or the value in rs1, depending on the instruction being executed. The select bit of this multiplexer is ASel. In the previous tasks, you already implemented sending an immediate to the ALU.

Memory: Do B-type instructions write to memory?

B-type instructions do not write to memory. This means we don't need to modify DMEM for this task.

Write back: What data is the B-type instruction writing, and where is the instruction writing this data to?

B-type instructions take the result of the addition (PC + immediate, from the ALU output) and might write the result to PC (depending on if the branch is taken). You should use a multiplexer to select which value will be written to PC.

The select bit of this multiplexer is PCSel. You will implement the logic for determining PCSel from the instruction bits in the control logic.

Task 7.4: Control Logic

Modify control-logic.circ to output the correct control logic signals for B-type instructions.

Testing and Debugging

Run the test_integration_branch integration tests. Follow the Debugging Integration Tests example in the Testing and Debugging Section.

These tests are not comprehensive, so you should write your own custom tests.

Task 8: Loading and Storing

Implement the following Base RV32I load and store instructions according to the course reference card:

lb
lh
lw
sb
sh
sw

Task 8.1: Immediate Generator

Edit the immediate generator in imm-gen.circ so that it can generate immediates for S-type instructions in addition to all the instruction types from previous tasks. See the earlier immediate generator task for details.

Testing: Like earlier, find some unit tests in test_imm_gen, which are not comprehensive. Like earlier, if you only implement generating S-type immediates now, some tests in test_imm_gen for other immediate types will fail, but make sure that the imm-gen-s-type test passes.

Task 8.2: Partial Loads and Stores

Implement the partial load and store circuits according to the Partial Loads and Stores section in the course notes.

Task 8.2.1: Fill in the partial_load.circ subcircuit.

Task 8.2.2: Fill in the partial_store.circ subcircuit.

Testing and debugging: Unit tests are partial_load and partial_store. These are not comprehensive.

Task 8.3: Datapath

With the help of the partial load and partial store circuits you've just made, modify your datapath in cpu.circ so that it can support loads and stores.

Reminders:

You should provide an address input MemAddress to DMEM. Remember that the ALU calculates this address by adding the address in the rs1 register and the offset immediate.
You should also provide MemWriteMask and MemWriteData to DMEM. These are calculated by your partial load and partial store subcircuits.
For load instructions, you should also add functionality in the write-back stage so that the DMEM output data, processed by your partial load subcircuit, is written back to the rd register.

Task 8.4: Control Logic

Modify control-logic.circ to output the correct control logic signals for loads and stores.

Testing and Debugging

You'll need to write your own custom tests. Before requesting help from staff, please make sure you have some tests written, or we'll ask you to write some tests first before helping you.

In test_integration_mem we have provided some tests for load and store instructions, but they require lui to be implemented first.

Task 9: Jumps and U-type Instructions

Implement the following Base RV32I U-Type and Jump instructions according to the course reference card:

jal
jalr
auipc
lui

Task 9.1: Immediate Generator

Edit the immediate generator in imm-gen.circ so that it can generate immediates for U-type instructions and J-type instructions. See the earlier immediate generator task for details.

Testing: Like earlier, find some unit tests in test_imm_gen, which are not comprehensive.

Task 9.2: Datapath

Modify your datapath in cpu.circ so that it can support these instructions. Most of these instructions are already supported by your datapath so far.

Reminders:

Note that the U-type instructions require left-shifting the immediate by 12 bits (e.g. lui is written as rd = imm << 12 on the reference card), but this should already be done by your immediate generator, so your datapath doesn't need to perform any extra shifting.
To support jalr, you should connect PC+4 to your multiplexer in the write-back stage so that PC+4 can be written back to rd.

Task 9.3: Control Logic

Modify control-logic.circ to output the correct control logic signals for jumps and U-type instructions.

Hint: Be careful about which ALU operation you're performing for the lui instruction. One of the ALU operations you made in Part A but didn't use anywhere else will come in handy here.

Testing and Debugging

We have provided some tests for jump instructions and lui (but not auipc): test_integration_jump and test_integration_lui. These tests are not comprehensive, so you should write your own custom tests.

Task 10: Pipelining

Warning: Do not move on to this task until you have finished Tasks 1-9.

In this task, you will implement a 2-stage pipeline in your CPU:

Instruction Fetch: An instruction is fetched from the instruction memory.
Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining four stages of a classic five-stage RISC-V pipeline (ID, EX, MEM and WB).

The separation between the two pipeline stages (highlighted by the green dividing line on the datapath) is illustrated below.

Pipeline diagram. Separation line is right after +4 block and IMEM.

Task 10.1: Getting Started

To get started, first think about which paths will have intermediate pipeline registers in them. Look at the provided illustration above and consider all the paths that intersect the dividing line. Paths that transfer data to the rest of the datapath (data going from left to right) will have corresponding pipeline registers in them, while feedback paths (data going from right to left) will not.

Think about which values are now different between the two stages of the pipeline. For example, will stage 1 and stage 2 have the same or different PC values? If the stages need different PCs, then you now need two different PC values in your circuit at any given time step.

Once you've listed out which values are different between the stages (hint: there aren't many), you'll need to store those values between the pipelining stages.

Finally, go through your entire circuit and make sure that you specify which stage's value you want to use for any values that are different between stages. For example, if the stages need different PCs, then any time you use PC in your circuit, you should specify whether you want to use the stage 1 PC, or the stage 2 PC.

Note: During the first cycle, the instruction register sitting between the pipeline stages won't contain an instruction loaded from memory. What should the second stage do? Luckily, Logisim automatically sets registers to zero on reset, so the instruction pipeline register will automatically start with a no-op! If you wish, you can depend on this behavior of Logisim.

Task 10.2: Hazards

Since your CPU will support branch and jump instructions, you'll need to handle control hazards that occur when branching.

The instruction immediately after a branch or jump should not be executed if a branch is taken. By the time you send a branch/jump instruction into stage 2, stage 1 has already fetched (possibly) the wrong next instruction. Therefore, you will need to flush the instruction fetched in stage 1 by replacing it with a no-op. You should flush the stage 1 instruction only if a branch is taken in the stage 2 instruction (do not flush if it is not taken). You should always flush the stage 1 instruction when the stage 2 instruction is a jump.

Hint: One of the control logic signals will tell you whether a branch or a jump is taken. You can use this control logic signal (from stage 2) in your stage 1 logic to determine when you need to flush the pipeline.

To flush an instruction, your stage 1 logic should send a no-op instruction into stage 2 instead of using the fetched instruction. You can use addi x0, x0, 0 (0x00000013) as a no-op.

Some more things to consider:

To MUX a no-op into stage 2, do you place it before or after the instruction register?
What address should be requested next while the EX stage executes a no-op? Is this different than normal?

Testing and Debugging

Add the --pipelined or -p flag to the testing commands to run the tests from the previous tasks on your pipelined CPU. For example:

bash test.sh run_custom -p
bash test.sh test_integration_branch -p
bash test.sh test_integration_immediates -p

Note that your pipelined CPU will no longer pass the non-pipelined tests (i.e. if you run tests without -p, they'll fail).

Task 11: Partner/Feedback Form

Congratulations on finishing the project! We'd love to hear your feedback on what can be improved for future semesters.

Please fill out this short form, where you can offer your thoughts on the project and (if applicable) your partnership. Any feedback you provide won't affect your grade, so feel free to be honest and constructive.

Submission and Grading

Submit your assignment to the Project 3B submission on Gradescope. Part B is worth 80% of your overall Project 3 grade.

Instructions (1.5 points each, 54 points total)
Integration tests (2 points each, 24 points total)
Feedback form (2 points)

Total: 80 points