1Learning Outcomes¶
Coming soon! We provide the animations for now.
🎥 Lecture Video
🎥 Lecture Video
2Building a processor that adds¶
To start off, let’s build the simplest processor we can: a processor that can process only one instruction: add. Programs will just be series of adds:
add x18 x18 x10
add x18 x18 x18
add ...In order to support add in our datapath, we consider the two state elements changed by this instruction’s operations:
RegFile: The instruction performs the operationR[rd] = R[rs1] + R[rs2]. This involves reading two registersrs1andrs2, adding the values together, and writing the result to registerrd.PC: The instruction updatesPC = PC + 4to update the program counter to the next instruction to execute. This involves reading thePCregister, adding4, and writing the result back to thePCregister.
Other state elements:
IMEM: The processor must read the RV32I instruction from the read-onlyIMEMduring theIF(Instruction Fetch) phase.DMEM: The processor does not additionally access memory via a load or store. Theaddinstruction does not participate in theMEMphase of [the five step process]Introduction.

Figure 1:For now, we disconnect DMEM since it is unused for add (Figure 1). We will add it back when we discuss loads and stores.
3Tracing the add Datapath¶
Given the above analysis, we can now connect wires between key elements of our processor. Use the menu bar to trace through the animation or download a copy of the PDF/PPTX file.
Instruction Fetch:
On the rising clock edge, the
pcwire updates to the instruction to execute in this cycle. It feeds intoIMEMwhich, after some delay, updates theinstoutput signal.Increment the PC to the next instruction. The
pcwire also feeds into a small adder that adds4. The output to this small adder is wired to the input of thePCregister, set up and ready to update on the next rising clock edge.
Instruction Decode: We only have one instruction, so decoding is simply decoding the specific bits to identify the registers. We use the green card and our R-Type format to introduce a splitter on the
instsignal to “index” into the RegFile as follows:Wire
inst[7:11](bits 7 through 11, inclusive) to therdinput of RegFile.Wire
inst[15:19]to thers1input of RegFile.Wire
inst[20:24]to thers2input of RegFile.
After some delay, the RegFile updates the
rdata1andrdata2signals to the values ofR[rs1]andR[rs2], wherers1andrs2are determined from the instructioninst.Execute: Our ALU (see below) should perform the Addition operation. For now, we just mark this block as an Adder. Feed in the two RegFile output signals into the
AandBinputs of the “ALU.” After some delay, theMemory: (We don’t access memory, so skip this.)
Write Back: Connect the output signal of the ALU to the
wdatainput signal of the RegFile. This signal should be set up and ready to update on the next rising clock edge.
4Arithmetic Logic Unit (ALU)¶
In the previous chapter we implemented a basic four-operation ALU. In the full RISC-V implementation, our ALU (Figure 2) must support all operations for R-Type instructions:

Figure 2:ALU Block.
Table 1:Signals for ALU Block
| Name | Direction | Bit Width | Description |
|---|---|---|---|
A | Input | 32 | Data to use for Input A in the ALU operation |
B | Input | 32 | Data to use for Input B in the ALU operation |
ALUSel | Input | 4 | Selects which operation the ALU should perform (see Table 2) |
ALUResult | Output | 32 | Result of the ALU operation |
4.1Course Project Details¶
Below, we detail the ALU operations that must be implemented for the course project’s datapath. We encourage revisiting this section after reading a few more example datapath traces.
Table 2:Operations for ALU Block for the course project
| ALUSel Value (for Project) | Operation | ALU Function |
|---|---|---|
| 0 | add | ALUResult = A + B |
| 1 | sll | ALUResult = A << B[4:0] |
| 2 | slt | ALUResult = (A < B (signed)) ? 1 : 0 |
| 3 | Unused | - |
| 4 | xor | ALUResult = A ^ B |
| 5 | srl | ALUResult = (unsigned) A >> B[4:0] |
| 6 | or | ALUResult = A | B |
| 7 | and | ALUResult = A & B |
| 8 | mul | ALUResult = (signed) (A * B)[31:0] |
| 9 | mulh | ALUResult = (signed) (A * B)[63:32] |
| 10 | Unused | - |
| 11 | mulhu | ALUResult = (A * B)[63:32] |
| 12 | sub | ALUResult = A - B |
| 13 | sra | ALUResult = (signed) A >> B[4:0] |
| 14 | Unused | - |
| 15 | bsel | ALUResult = B |
Observations/reminders:
When performing shifts, only the lower 5 bits of
Bare needed, because only shifts of up to 32 are supported.The comparator component might be useful for implementing instructions that involve comparing inputs. See the branch implementation later in this chapter.
A multiplexer (MUX) might be useful when deciding between operation outputs (recall our basic 4-operation ALU). Consider first processing the input for all operations first, and then outputting the one of your choice.
(sec-general-multiplication)
4.2General Multiplication¶
An ALU that implements the mul, mulh, and mulhu instructions can support parts of the RISC-V “M” extension.
| Instruction | Name | Description | Type | Opcode | Funct3 | Funct7 |
|---|---|---|---|---|---|---|
mul rd rs1 rs2 | MULtiply | R[rd] = (R[rs1] * R[rs2])[31:0] | R | 011 0011 | 000 | 000 0001 |
mulh rd rs1 rs2 | MULtiply Higher Bits | R[rd] = (R[rs1] * R[rs2])[63:32] (Signed) | R | 011 0011 | 0001 | 000 0001 |
mulhu rd rs1 rs2 | MULtiply Higher Bits (Unsigned) | R[rd] = (R[rs1] * R[rs2])[63:32] (Unigned) | R | 011 0011 | 011 | 000 0001 |
The result of multiplying 2 32-bit numbers can be up to 64 bits of information, but we’re limited to 32-bit data lines, so mulh and mulhu are used to get the upper 32 bits of the product. The Multiplier component has a Carry Out output (with the description “the upper bits of the product”) which might be particularly useful for certain multiply operations.