CSci 4203 Fall 2023 Lab Assignment 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CSci 4203 Fall 2023 Lab Assignment 2
(Note: There are 16 pages in this document)
1. Goal
In this lab assignment, you are provided with a semi-complete behavioral implementation of the MIPS-like instruction pipeline, that supports hazard detection and load forwarding. You are asked to make some local changes to add functionalities and support two additional instructions specified in Section 5, and pass all provided test cases listed in Section 4. The first problem (Problem 0) in Section 5 is a “warm-up” exercise to get you started and familiarize yourself with the pipeline.
To ease the design task in a complex system, you should use the power of “abstraction” , i.e. you only need to add/change the components/signals required in your tasks, and treat all other components/signals as “black boxes” . You will find the parts you are asked to change/add in the provided pipeline are really very limited in scope. Better yet, the places in the design that need to be changed by you have been provided with suggestions in the form of comments “//TODO” . You can start from those places in your design changes.
You should read Section 2 “ Background” first to find out key components/signals in the pipeline. Try a couple of test cases listed in Section 2.2, and make sure the provided pipeline works as you expected. Section 3 shows step-by-step on how to simulate/validate those test cases. Then, you are ready to work on the three assignment problems listed in Section 5. Run the automatic grader in Section 4 when you complete your design changes, and verify your design changes work correctly. Submit your revised design files , and you are done! After this lab assignment, you should have a very solid understanding of how pipelining works in all of the microprocessors being used today.
Try to start early, and don’t expect your design will “magically” work on your first few tries. So, don’t wait until the last few days before the deadline. Take advantage of the office hours on every weekday and the Discussion forum on Canvas, if you have questions. The schedule of the office hours are listed in the course syllabus on Canvas. Good luck!
Reading
Read Section 4.14 of the textbook, or downloaded it from the following link (from the previous edition of the textbook)
(http://booksite.elsevier.com/9780124077263/downloads/advance_contents_and_appendices/s ection_4.13.pdf).
2. Background
The provided MIPS pipeline consists of five stages, namely, the IF(instruction fetch), ID (instruction decode), EX (execute), MEM (memory) and WB (writeback). These stages have been partially implemented in the SystemVerilog files provided to you. However, regular Verilog is sufficient to solve this assignment.
2. 1 Given Pipeline Implementation :
The provided pipeline components (pipeline registers, control signals, data paths etc.) are defined at the start of the program. The initial memory and register state are initialized inside an ‘initial’ block. The pipeline stages are implemented inside an ‘always @(posedge clock)‘ block, which causes their states to be updated every positive clock edge.
Figure 1: Program Counter Register Update (PCR) Stage
The first pipeline stage is the program counter register update (PCR) stage (see Figure 1 above). This stage updates the value of the program counter (PC) register. The input ports are the clk, branchTaken, stall and branchPCOffset. The positive edge of the clk is used to trigger an update to the PC pipeline register. The stall and branchTaken signals are used to select the next PC value. If the stall signal is active, the next PC value will be the same PC as the previous clock cycle. If the stall signal is inactive, but there is an active branchTaken signal, then the next PC value will be PC+branchPCOffset value. If neither of stall nor branchTaken signals are active, the next PC value will be set to PC + 4. Figure 1 represents the above implementation as two switches connected in series, switch 1 and switch 2. The part you need to modify in the code corresponds to switch 1 (shaded in yellow). See pcr.sv for the Verilog code.
Figure 2: Instruction Fetch (IF) Stage
The Instruction Fetch (IF) stage (Figure 2) updates a new instruction into the IFIDIR register every positive clock edge. There are four inputs to the fetch stage. They are the clk, ijmpMem, stall and PC. There is one output of the fetch stage, the IFIDIR register value. The branchTaken signal indicates that a branch has changed the PC to a branched location. The stall signal indicates that there is a hazard between the decode stage and one of the other stages. Lastly, PC is the current PC value, from the PCR stage.
Every cycle, switch 2 checks the branchTaken signal. If true, it injects a nop into the IFIDIR register. Otherwise, switch 1 checks for a stall. If there is a stall, then it will updated IFIDIR with the same value as the previous cycle. Otherwise, it will update IFIDIR with a new instruction based on the word represented by PC at instruction memory location PC >> 2. See fetch.sv for the Verilog code corresponding to the Fetch stage. No modifications need to be made in this stage.
Figure 3: Instruction Decode (ID) Stage
The Instruction Decode (ID) stage. It has six inputs and five outputs. The six inputs are clk, stall, IDEXAfromWB, IDEXBfromWB, IFIDIR and MEMWBValue.
clk and stall have the usual meanings as discussed above. IDEXAfromWB and IDEXBfromWB are signals indicating that there is an incoming writeback to either the rs or rt registers of the instruction being decoded. IFIDIR is the value of the pipeline register between the ID and IF stages. MEMWBValue contains the data from the WB stage that needs to be written into the register file.The outputs of the ID stage are IDEXIR, IDEXA, IDEXB, branchTaken and branchPCOffset.
There are five components in the decode section. Components A and B update the IDEXA and IDEXB pipeline registers, either with nop, or using the values of R[rs] or R[rt], or using the data forwarded from the WB stage. Component C updates the IDEXIR pipeline register either using nop or the IFIDIR value received from the IF stage. Component D needs to be added by you to update the contents of R[rt] using our definition of the BEQINIT instruction (see Problem 2). Component E needs to create the control signals branchTaken and branchPCOffset by evaluating whether the branch condition R[rs] == R[rt] is true. The changes in logic you need to make are are shaded in yellow. See decode.sv for the Verilog implementation.
Figure 4: Execute (EX) Stage
The Execute (EX) stage performs arithmetic operations. For example, an ALU type instruction like ADD, will carry out the addition in the execute stage.
The execute stage (see Figure 4) has eleven inputs to the execute stage, and four outputs. The eleven inputs include the six bypass signals, which are used in the forwarding circuit. They are
bypassAfromMem, bypassAfromALUinWB, bypassAfromLWinWB, bypassBfromMEM, bypassBfromALUinWB, bypassBfromLWinWB. The remaining input signals are the clk, IDEXIR, IDEXA, IDEXB and MEMWBValue. The output signals are from the pipeline registers EXMEMB, EXMEMIR and EXMEMALUOut.
There are four components. Component A is the forwarding unit which is used to create Ain and Bin, which are the values R[rs] and R[rt] corresponding to the instruction that needs to be executed. Component B carries out a computation on Ain and Bin. It stores the result in the EXMEMALUOut pipeline register. The ADD instruction is already implemented, but you’ll need to implement other ALUop, CINDC and BEQINIT instructions (see Problems 1,2 and 3). Component C propagates the Bin signal into the EXMEMB pipeline register, for use by the next pipeline stage MEM. Component D propagates the IDEXIR signal into the EXMEMIR pipeline register for use by the next pipeline stage MEM. You can implement the above logic (shaded in yellow) in execute.sv.
Figure 5: Forwarding Logic
The forwarding circuit inside the ALU is presented in Figure 5. If any hazards are detected
relative to the MEM stage and the WB stage, then Ain and Bin will be modified to account for
those hazards. The forwarding logic has ten inputs and two outputs. The first group of signals are the ‘bypass’ control signals. They indicate whether forwarding needs to take place from
MEM to EX or WB to EX. bypassAfromMEM means that forwarding needs to take place from
MEM, on the first input to the ALUs, Ain. If bypassAfromALUinWB is true, forwarding needs to take place from the WB stage to Ain, due to an ALU type instruction in the WB stage. If
bypassAfromLWinWB is true, forwarding needs to happen due to an LW instruction in the WB stage. Similarly, there are three bypass signals for the Bin input. The IDEXA and IDEXB signals contain values of R[rs] and R[rt] obtained from the ID stage. They will be assigned to Ain and
Bin respectively, if none of the forwarding signals are true. EXMEMALUOut is the value that is forwarded from the MEM state. MEMWBValue is the value that is forwarded from the WB stage. You can implement the above logic (shaded in yellow) in forward.sv.
Other Phases:
The Memory (MEM) stage will access the memory for LW/SW instructions and do nothing for the others. The Writeback (WB) stage will access the register file and write to it.
There is also a control block, which generates the control signals which are input to these blocks. The code for the above stages need not be modified. All the modules are connected together in cpu.sv, which also does not need to be modified.
2.2 Test Cases :
In order to test the implementation, test cases have been provided in two folders, named specific_tests and random_tests. The specific tests target only a single instruction to be tested. The random tests contain random combinations of multiple instructions. Each test case consists of five .dat files, namely, dmem.dat, imem.dat, mem_result_expected.dat, regs.dat, and regs_result_expected.dat. The first three .dat files contain the initial state of the memory and registers. dmem.dat contains the initial data memory state, imem.dat contains the initial instruction memory state, and regs.dat contains the initial register state. The first line corresponds to r0, the second line to r1 and so on for the regs.dat file. Similarly, the first line corresponds to byte 0, the second line to byte 4 and so on, for the imem.dat and dmem.dat files. There are 32 registers in reg s.dat and 32 4-byte aligned memory locations in imem.dat and dmem.dat each.
The last two files mem_result_expected.dat and regs_result_expected.dat, should contain the final memory and register state once the execution has completed 2000 cycles and exited.
3. Executing the Test Cases Manually
Create a new Verilog RTL project in Vivado. Import all the .sv files from the handout into your project. Change the paths in parameters.sv to the absolute path of the .dat files corresponding to the test case you want to simulate. The .dat files are found in the specific_tests and random_tests folders of the handout. Then execute the test case in Vivado using ‘Run Behavioral Simulation’ . The resultant memory registers state after simulation will be stored in two newly created files mem_result.dat and regs_result.dat, at the location you specified. Note that there are also two files mem_result_expected.dat and regs_result_expected.dat. You can compare the generated result files to the expected ones manually.
The absolute-path specification may be a little bit different depending on whether you are running Vivado on Windows or on Linux. On Windows, the absolute path needs to be specified using “\\” separators. For example, one possible command for reg s.dat on my windows laptop was
filename="C:\\Users\\Kartik\\Documents\\lab2\\reg s.dat";
On Linux, the “/” separator should be used. For example
filename="/home/ramkr004/lab2/reg s.dat";
3. 1 Observing Waveforms
A useful tool to debug your modifications to the processor, is to observe the waveform for different components of the processor.
i. Change the paths of the filenames, as mentioned above. One example path is shown in the screenshot below, which points to a test case that uses an ‘addition’ instruction, in the specific_tests folder.
ii. Run the simulation, using Vivado, using ‘ Run Behavioral Simulation’ .
Click on “ Run Simulation > Run Behavioral Simulation” . This should open the following window, as shown below.
Click on ‘ Untitled 1’ , which should open a waveform window. Then click ‘cpu’ in the Scope window. The Vivado display should now look like the following :
Drag and drop the clock signal, the pipeline intermediate register signals, IFIDIR, IDEXIR, EXMEMIR and MEMWBIR, and the program counter PC, into the ‘ Name’ column of the waveform window. Then, click on the ‘ Relaunch Simulation’ icon on the top toolbar to populate these waveform shapes. The waveform window should now look like below :
The four intermediate registers shown here are IFIDIR, IDEXIR, EXMEMIR, MEMWBIR. These intermediate pipeline registers store the instructions in binary form as they propagate through the pipeline. IFIDIR is between the IF (fetch) and ID (decode) stage, IDEXIR is between the ID and EX (execute) stages, EXMEMIR is between the EX and MEM (memory) stages, and MEMWBIR is between the MEM and WB (writeback) stages.
Changes to the register state can also be observed :
3. 2 Manually Using The Dat Files
The state of registers and the memory at the end of the simulation, can be manually inspected. The result of the execution can be used to check correctness.
regs.dat has the following initialization:
00000000000000000000000000000000
00000000000000000000000000001001
00000000000000000000000000010001
00000000000000000000000000001010
00000000000000000000000000001001
00000000000000000000000000011010
00000000000000000000000000010010
00000000000000000000000000000011
00000000000000000000000000011100
00000000000000000000000000010100
00000000000000000000000000011011
00000000000000000000000000000111
00000000000000000000000000010110
00000000000000000000000000001110
00000000000000000000000000010011
00000000000000000000000000001001
00000000000000000000000000000110
00000000000000000000000000001101
00000000000000000000000000010001
00000000000000000000000000000011
00000000000000000000000000000101
00000000000000000000000000000101
00000000000000000000000000011000
00000000000000000000000000011110
00000000000000000000000000001011
00000000000000000000000000001101
00000000000000000000000000011011
00000000000000000000000000010100
00000000000000000000000000001110
00000000000000000000000000010100
00000000000000000000000000000001
00000000000000000000000000011100
dmem.dat has the following values:
00000000000000000000000000000110
00000000000000000000000000010011
00000000000000000000000000011111
00000000000000000000000000001100
00000000000000000000000000001101
00000000000000000000000000001011
00000000000000000000000000011011
00000000000000000000000000000110
00000000000000000000000000000010
00000000000000000000000000011000
00000000000000000000000000001000
00000000000000000000000000001101
00000000000000000000000000011010
00000000000000000000000000000011
00000000000000000000000000011111
00000000000000000000000000011010
00000000000000000000000000010010
00000000000000000000000000010111
00000000000000000000000000011000
00000000000000000000000000011010
00000000000000000000000000011101
00000000000000000000000000010111
00000000000000000000000000001001
00000000000000000000000000010100
00000000000000000000000000010010
00000000000000000000000000000110
00000000000000000000000000010100
00000000000000000000000000010001
00000000000000000000000000000010
00000000000000000000000000000001
00000000000000000000000000011101
00000000000000000000000000000111
imem.dat has the following initialization:
00000000100000100100000000100000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
The instruction in the first word is ‘add r8, r4, r2’ .
Let’s break this into the fields of an instruction word in MIPS to understand it.
r2 is initialized to 10001 (in binary), as specified by reg s.dat. The value of r4 is 1001. Lastly the value in r8 is initialized to 111000. The generatedregs_result.dat should show that r8 has value r4 + r2, which is 11010. All other instructions in memory are initialized to 0, which represents nop instructions.
regs_result.dat shows the final output of the register file - result after executing the instructions from the test case.
00000000000000000000000000000000
00000000000000000000000000001001
00000000000000000000000000010001
00000000000000000000000000001010
00000000000000000000000000001001
00000000000000000000000000011010
00000000000000000000000000010010
00000000000000000000000000000011
00000000000000000000000000011010
00000000000000000000000000010100
00000000000000000000000000011011
00000000000000000000000000000111
00000000000000000000000000010110
00000000000000000000000000001110
00000000000000000000000000010011
00000000000000000000000000001001
00000000000000000000000000000110
00000000000000000000000000001101
00000000000000000000000000010001
00000000000000000000000000000011
00000000000000000000000000000101
00000000000000000000000000000101
00000000000000000000000000011000
00000000000000000000000000011110
00000000000000000000000000001011
00000000000000000000000000001101
00000000000000000000000000011011
00000000000000000000000000010100
00000000000000000000000000001110
00000000000000000000000000010100
00000000000000000000000000000001
00000000000000000000000000011100
The r8 register has been updated.
mem_result .dat has the following result.
00000000000000000000000000000000
00000000000000000000000000001001
00000000000000000000000000010001
00000000000000000000000000001010
00000000000000000000000000001001
00000000000000000000000000011010
00000000000000000000000000010010
00000000000000000000000000000011
00000000000000000000000000011010
00000000000000000000000000010100
00000000000000000000000000011011
00000000000000000000000000000111
00000000000000000000000000010110
00000000000000000000000000001110
00000000000000000000000000010011
00000000000000000000000000001001
00000000000000000000000000000110
00000000000000000000000000001101
00000000000000000000000000010001
00000000000000000000000000000011
00000000000000000000000000000101
00000000000000000000000000000101
00000000000000000000000000011000
00000000000000000000000000011110
00000000000000000000000000001011
00000000000000000000000000001101
00000000000000000000000000011011
00000000000000000000000000010100
00000000000000000000000000001110
00000000000000000000000000010100
00000000000000000000000000000001
00000000000000000000000000011100
There is no change to the contents of the memory locations.
3. 2 Auto-Decoding Utility
A binary called ‘decode’ is provided. It should be placed in the same folder as an imem.dat file and executed, using ‘ ./decode’. This will generate a decoded version of imem.dat named decoded_imem.dat. For example, the result of running decode for the above example yielded: “add r8, r4, r2” .
4. Automated Execution And Grading
We support automatic grading of the assignment on Vole. Each instruction is tested using different memory and register states. For the ALU type instructions, there are two test cases for each of the instructions ADD, NAND, XOR, SRL and SGT. Thus, there are ten test cases for each of these instructions, but only two test cases per instruction are run by the grader. Two points are assigned for each test case that is graded. The total points is twenty for all these test cases combined.
For the remaining instructions CINDC and BEQINIT, ten test cases are provided. These are each worth two points. Thus, a total of twenty points is assigned for each instruction. The remaining twenty points are based on test cases that are random sequences of instructions. Forty test cases have been provided, each of which is worth one point.
In order to run all these test cases, a script is provided to you. Use the command ./grade.sh. It will take one hour or more time to complete grading all the test cases. A file ‘score.txt’ is created, which contains your score for each of the test cases. Please do your automatic grading on Vole because the grader has not been tested in other environments.
5. Problems
5. 0 Problem 0 (20 points):
Currently, an ADD instruction has been implemented in the execute.sv file.
Modify execute.sv so that it can run XOR, NAND, SGT (Set Greater Than) and SRL (Shift Right Logical Variable) instructions.
Hint : You need to modify the EX stage so that the ALU performs these operations. The function codes for XOR, NAND, SGT, and SRL are 50, 51, 52 and 53, respectively.
ADD performs the operation R[rd] = R[rs]+r[rt] (already implemented)
XOR performs the operations R[rd] = (R[rs] ^ R[rt])
NAND performs the operations R[rd] = (~(R[rs] & R[rt]))
SGT performs the operation R[rd] = (R[rs] > R[rt])
SRL performs the operation R[rd] = (R[rs] >> R[rt])
Hint :
Fill in the TODO parts (marked in the comments) of the provided code.
5. 1 Problem 1 (20 points) :
Add support for the “conditional increment decrement” instruction (opcode 47), which uses the ‘ R’ instruction format and is defined as follows :
cindc $rs, $rt, $rd
if (R[rs] > 0) R[rd] = R[rs] - R[rt]
else R[rd] = R[rs] + R[rt]
We need to modify the ID stage.
Hint: Fill in the TODO parts (marked in the comments) of the code.
5. 2 Problem 2 (20 points) :
Augment the given MIPS design so that it can run the “branch if equals to absolute address” instruction (opcode 48) which uses the ‘ I’ instruction format and is defined as follows :
beqin it $rs, $rt, offset
if (R[rs]=R[rt]) {
PC = PC + offset
R[rt] = 1
}
Key modifications to the code are in the ID stage.
Hint: Fill in the TODO parts (marked in the comments) of the code in decode.sv.
5. 3 Problem 3 (40 points) :
In this part of the problem, we create sequences containing different kinds of instructions, to see whether they can work together. A key part of this problem is to implement data forwarding. There are two kinds of forwarding which need to be implemented, which need to assign the values for Ain and Bin in forward.sv. This part also checks whether your BEQINIT instruction works because incorrect branching may cause an incorrect result. It also checks your CINDC results because incorrect CINDC results can cause a wrong overall register and memory state.
The random tests in the folder random_tests are used to generate random instruction sequences where forwarding may occur. One point is assigned for each test case.
Hint :
Fill in the TODO parts (marked in the comments) of the code in forward.sv.
5. 4 Test Cases :
There are totally seventy test cases which will be evaluated by the autograder. Ten test cases are for the new ALU instructions. Another twenty test cases are for the CINDC and BEQ instructions, ten for each instruction. There are forty test cases which are arbitrary mixes of all of these instructions. They also include predefined instructions such as LW and SW. Use the automatic grading (see 4.) to run all these test cases and obtain a score. Two points are assigned per test case for the specific test cases and one point per test case for the random test cases.
5. 4 Handout :
You are provided with an incomplete MIPS-like behavioral model (i.e. only the components required for this lab assignment), two testcase folders specific_tests and random_tests, this pdf and an automatic grader to calculate your score.
5. 5 Handin :
You only need to submit your modified pcr.sv, execute.sv, fetch.sv, decode.sv, mem.sv, wb.sv and forward.sv files on Canvas.
5. 6 Grading Criteria :
Credit is assigned based on the provided test cases. It is also possible to assign partial credit based on your implementation if your solution does not work. Please write your code neatly and comment on it, this is to your advantage.
5. 7 Important :
Please verify that your code can be simulated by Vivado to completion. Code that does not complete simulation will not receive full credit.
2023-12-20