Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

2Building a processor that adds

To start off, let’s build the simplest processor we can: a processor that can process only one instruction: add. Programs will just be series of adds:

add x18 x18 x10
add x18 x18 x18
add ...

In order to support add in our datapath, we consider the two state elements changed by this instruction’s operations:

Other state elements:

For now, we disconnect DMEM since it is unused for add (). We will add it back when we discuss loads and stores.

Figure 1:For now, we disconnect DMEM since it is unused for add (Figure 1). We will add it back when we discuss loads and stores.

3Tracing the add Datapath

Given the above analysis, we can now connect wires between key elements of our processor. Use the menu bar to trace through the animation or download a copy of the PDF/PPTX file.

  1. Instruction Fetch:

    • On the rising clock edge, the pc wire updates to the instruction to execute in this cycle. It feeds into IMEM which, after some delay, updates the inst output signal.

    • Increment the PC to the next instruction. The pc wire also feeds into a small adder that adds 4. The output to this small adder is wired to the input of the PC register, set up and ready to update on the next rising clock edge.

  2. Instruction Decode: We only have one instruction, so decoding is simply decoding the specific bits to identify the registers. We use the green card and our R-Type format to introduce a splitter on the inst signal to “index” into the RegFile as follows:

    • Wire inst[7:11] (bits 7 through 11, inclusive) to the rd input of RegFile.

    • Wire inst[15:19] to the rs1 input of RegFile.

    • Wire inst[20:24] to the rs2 input of RegFile.

    After some delay, the RegFile updates the rdata1 and rdata2 signals to the values of R[rs1] and R[rs2], where rs1 and rs2 are determined from the instruction inst.

  3. Execute: Our ALU (see below) should perform the Addition operation. For now, we just mark this block as an Adder. Feed in the two RegFile output signals into the A and B inputs of the “ALU.” After some delay, the

  4. Memory: (We don’t access memory, so skip this.)

  5. Write Back: Connect the output signal of the ALU to the wdata input signal of the RegFile. This signal should be set up and ready to update on the next rising clock edge.

4Arithmetic Logic Unit (ALU)

In the previous chapter we implemented a basic four-operation ALU. In the full RISC-V implementation, our ALU (Figure 2) must support all operations for R-Type instructions:

ALU Block.

Figure 2:ALU Block.

Table 1:Signals for ALU Block

NameDirectionBit WidthDescription
AInput32Data to use for Input A in the ALU operation
BInput32Data to use for Input B in the ALU operation
ALUSelInput4Selects which operation the ALU should perform (see Table 2)
ALUResultOutput32Result of the ALU operation

4.1Course Project Details

Below, we detail the ALU operations that must be implemented for the course project’s datapath. We encourage revisiting this section after reading a few more example datapath traces.

Table 2:Operations for ALU Block for the course project

ALUSel Value
(for Project)
OperationALU Function
0addALUResult = A + B
1sllALUResult = A << B[4:0]
2sltALUResult = (A < B (signed)) ? 1 : 0
3Unused-
4xorALUResult = A ^ B
5srlALUResult = (unsigned) A >> B[4:0]
6orALUResult = A | B
7andALUResult = A & B
8mulALUResult = (signed) (A * B)[31:0]
9mulhALUResult = (signed) (A * B)[63:32]
10Unused-
11mulhuALUResult = (A * B)[63:32]
12subALUResult = A - B
13sraALUResult = (signed) A >> B[4:0]
14Unused-
15bselALUResult = B

Observations/reminders:

  • When performing shifts, only the lower 5 bits of B are needed, because only shifts of up to 32 are supported.

  • The comparator component might be useful for implementing instructions that involve comparing inputs. See the branch implementation later in this chapter.

  • A multiplexer (MUX) might be useful when deciding between operation outputs (recall our basic 4-operation ALU). Consider first processing the input for all operations first, and then outputting the one of your choice.

(sec-general-multiplication)

4.2General Multiplication

An ALU that implements the mul, mulh, and mulhu instructions can support parts of the RISC-V “M” extension.

InstructionNameDescriptionTypeOpcodeFunct3Funct7
mul rd rs1 rs2MULtiplyR[rd] = (R[rs1] * R[rs2])[31:0]R011 0011000000 0001
mulh rd rs1 rs2MULtiply Higher BitsR[rd] = (R[rs1] * R[rs2])[63:32] (Signed)R011 00110001000 0001
mulhu rd rs1 rs2MULtiply Higher Bits (Unsigned)R[rd] = (R[rs1] * R[rs2])[63:32] (Unigned)R011 0011011000 0001

The result of multiplying 2 32-bit numbers can be up to 64 bits of information, but we’re limited to 32-bit data lines, so mulh and mulhu are used to get the upper 32 bits of the product. The Multiplier component has a Carry Out output (with the description “the upper bits of the product”) which might be particularly useful for certain multiply operations.