Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSE 141L Lab 1.  9-Bit Instruction Set Architecture

Summer II, 2022

In this lab you shall design the instruction set and overall architecture for your own special-purpose (very) reduced instruction set (RISC) processor. You will design the hardware for your processor core in subsequent labs.

Your processor shall have 9-bit instructions (machine code) and shall be optimized for three simple programs, described below. For this lab, you shall design the instruction set and instruction formats and code three programs to run on your instruction set. Given the tight limit on instruction bits, you need to consider the target programs and their needs carefully. The best design will come from an iterative process of designing an ISA, then coding the programs, redesigning the ISA, etc.

Your instruction set architecture shall feature fixed-length instructions 9 bits wide. Your instruction-set specification should describe:

● what operations it supports and what their respective opcodes are.

               For ideas, see MIPS or ARM instruction lists  

● how many instruction formats it supports and what they are (in detail -- how many bits for each field, and where they’re found in the instruction). Your instruction format description should be detailed enough that someone could write an assembler (a program that creates machine code from assembly code) for it. (Again, refer to ARM or MIPS.)

● number of registers, and how many general-purpose or specialized. All internal data paths and storage will be 8 bits wide.

● addressing modes supported (this applies to both memory instructions and branch instructions). That is, how are addresses constructed or calculated? Lookup tables? Sign extension? Direct addressing? Indirect, as used in linked lists or ARM or MIPs to address the data_memory from reg_file contents?

For this to fit in a 9-bit field, the memory demands of these programs will have to be small. For example, you will have to be clever to support a conventional main memory of 256 bytes (8-bit address pointer). You should consider how much data space you will need before you finalize your instruction format. Your instructions are stored in a separate memory, so that your data addresses need be only big enough to hold data. Your data memory is byte-wide, i.e., loads and stores read and write exactly 8 bits (one byte). Your instruction memory is 9 bits wide, to hold your 9-bit machine code, but it can be as wide as you need to hold all three programs.

You shall write and run three programs on your ISA. You may assume that the first starts at address 0, and the other two are found in memory after the end of the first program (at some nonoverlapping address of your choosing). The specification of your branch instructions may depend on where your programs reside in memory, so you should make sure they still work if the starting address changes a little (e.g., if you have to rewrite one of the programs and it causes the others to also shift). This approach will allow you to put all three programs in the same instruction memory later on in the quarter.

We will impose the following constraints on your design, which will make the design a bit simpler. You shall assume a single-address data memory (Verilog design provided). You shall also assume a register file (or whatever internal storage you support) that can write to only one register per instruction. You may also have a single ALU condition/flag register (e.g., carry out, or shift out, sign result, zero bit, etc., like ARM's Z, N, C, and V status bits) that can be written at the same time as an 8-bit register, if you want. You may read more than one register per cycle. Please restrict register file depth to no more than 16 registers. Also, manual loop unrolling of your code is not allowed.

In addition to these constraints, the following suggestions will either improve your performance or greatly simplify your design effort. In optimizing for performance, distinguish between what must be done in series vs. what can be done in parallel. An instruction that does an add and a subtract (but neither depends on the output of the other) takes no longer than a simple add instruction. Similarly, a branch instruction where the branch condition or target depends on a memory operation will make things more difficult later on.

Generic, general-purpose ISAs (that is, those that will execute other programs just as efficiently as those shown here) will be seriously frowned upon. We really want you to optimize for these programs only, as is common practice in consumer embedded systems with tight cost consstraints.  

You shall turn in a lab report, plus program listings. The report will answer the questions posed below. In describing your architecture, keep in mind that the person grading it has much less experience with your ISA than you do. It is your responsibility to make everything clear -- one objective of this course is to help you improve your technical writing and reporting skills, which will benefit you richly in your career.

For each lab, there will be a set of requirements and questions that direct the format of the writeup and make it easier to grade, but strive to create a report you can be proud of. Sometimes that may require a little repetition, e.g. describing something where you think it belongs in the report, and then again in the “question” part, so the graders won’t miss it.

Additional constraints:

1) Your ALU instructions will be comparable in complexity to those in ARM.

2) Your data memory will have only one address pointer input port, for both input and output.

3) Your register file will have no more than two output ports and one input port. You may use separate pointers for reads and writes, if you wish.

4) You may use lookup tables (LUTs) / decoders, but these are limited to 32 elements each (i.e., address pointer width up to 5 bits). We do not allow something like a 512-element LUT with 32-bit outputs which simply maps your restricted 9-bit machine code field to a 32-bit clone of ARM or MIPS instructions. (It was "cute" the first time a team did this.)

Components of Lab 1 report:

0. Team.

List the names of all members of your team.

1. Introduction.  

This should include the name of your  architecture, overall philosophy, specific goals strived for and achieved.

2. Instruction formats.  

List all formats and an example of each. (ARM has R, I, and B type instructions, for example.)

3. Operations.  

List all instructions supported and their opcodes/formats.

4. Internal operands.  

How many registers are supported?  Is there anything special about any of the registers, or all of them general purpose?

5. Control flow (branches).  

What types of branches are supported?  How are the target addresses calculated?  What is the maximum branch distance supported?

6. Addressing modes.  

What memory addressing modes are supported, e.g. direct, indirect?  How are addresses calculated?  Give examples.

Additionally, please explicitly answer the following questions.

7. Can you classify your machine in any of the classical ways (e.g., stack machine, accumulator, register, load-store)? If so, which?  If not, devise a name for your class of machine.

8. Give an example of an “assembly language” instruction in your machine, then translate it into machine code.

For 9-11, give assembly instructions.  Make sure your assembly format is either very obvious or well described, and that the code is well commented. If you also want to include machine code, the effort will not be wasted, since you will need it later. We shall not correct/grade the machine code. State any assumptions you make. If you need initial nonzero values in registers or memory, you need to put them there. (Exception -- the test bench will load the incoming operands for you.)

9.  Program 1 (encrypter):

A message will be implanted by the test bench into your data memory, in locations [0:51], one ASCII character each):

                                     Example: Mr. Watson, come here. I want to see you.

(Spaces and punctuation marks count.) Start with this message, but you should be able to handle any 1- to 52-byte message that uses ASCII characters corresponding to 0x20 through 0x9f.

1. (See the 8-bit ASCII table at the end of this writeup to convert this sequence of characters into numerical bytes. Example:   M = 77 decimal = 0x4d)  

                             My Verilog test bench will handle this conversion for you.

2. The number of required initial space characters will be stored in data memory at location [61]. Prepend a preamble of ASCII space characters (space = 32 decimal = 0x20). After encryption, you shall put these into data_memory starting at [64].

3. Write a 7-bit maximal length linear feedback shift register (LFSR) using the tap sequence number stored in data memory location 62. For example, if the tap for the corresponding stored number is 7543, then:

             x'[0]    = x[7]^x[5]^x[4]^x[3];

             x'[6:1] = x[5:0];

      where ^ denotes XOR and x[6] is the most significant bit of x and x[0] is the least. x'[6:0] is the "next state," and x[6:0] is the "present state."

      See 8-tap LFSR schematic at the end of this writeup to help you visualize what is going on. The logic symbols are XORs.

     There are 9 different LFSR feedback patterns that result in a maximal length sequence of all 127 nonzero states. Your encryptor should be programmable to any randomly selected LFSR feedback pattern. The testbench will load the LFSR pattern into data memory location [62] and the starting LFSR state into data_memory location [63].

4. For a count equal to the value in memory[61], bitwise XOR ASCII space = 0x20 with each successive value of your LFSR, subtract 0x20, and write the results into data_memory[64:64+(mem[61])][6:0], with the constraint that there will be no fewer than 10 preamble space characters and no more than 15. Thus, if data_memory[61]<10, insert 10 spaces. If data_memory[61]>15, insert only 15 spaces. (Rationale: we need a known 10-character sequence to synchronize our Lab 2 decoder.)

5. Now read each value of the message out of memory, starting at location [0], XOR it with the next state of the LFSR, subtract 0x20, and write the result back into memory locations starting where step 4 left off and continuing to [127]. (This includes the prepended and appended spaces characters, so just pad the end of the message with encrypted space characters to fill the space.)

6. Finally (or as you go), set bit [7] of each value in data_mem[64:127] to the reduction XOR (parity) of that location's bits[6:0]. This is precisely the P[0] or P0 augmented parity bit we saw in CSE140L, for those who took that class with me.

7. The test bench will write out the resulting message by looking up the stored values in the ASCII table.  

Your ISA needs to be able to accomplish the above, starting with some way to bring in data from data_memory[0:63], generate a 7-bit LFSR, bitwise XOR each preamble (ASCII space) and data byte with a different LFSR state, subtract 32, insert parity in each MSB, and store the result back into data_memory[64:127]. See in-class demonstration.

10. Program 2 (decrypter):

An encrypted message from Program 1 will now occupy data memory locations [64:127].

1. By examining the first 10 bytes of this message, figure out the seed (inital) value for the LFSR and its feedback pattern. You will probably need to perform a correlation to accomplish this. (You, of course, know your own seed, but assume you received someone else's encrypted message, with an unknown seed and pattern. How would you crack the code, given that there are 127 possible initial seeds and 9 maximal length feedback patterns?)

2. Proceed as in Program 1, except that we'll be reading from memory locations [64:127] and writing back to [0:63]. (This decrypted message will start with 9 to 15 ASCII space charcters, followed by the message itself, and ending in ASCII space characters as needed to fill the space.)

3. If you have properly seeded your decrypter's LFSR and can use the same feedback pattern as in the encrypter, you should be able to recover the original message. Don't forget to add the offset of 32 back in.  

Your ISA needs to be able to accomplish everything it can already do for Program 1, but in addition, it needs to be able to search through LFSR states and identify the correct one feedback and initial state. This is the heart of the assignment.  

11. Program 3 (error detection and remove initial space characters)

1. This program shall detect the location of the first non-space character in the message, since there will be leading preamble padding bits of 0x20. (The message itself may also have started with additional space characters. We shall remove these, as well, because we have no way of distinguishing padding preambles from starting spaces within the message body.) It will copy this character into memory location[0], and successive non-zero characters into memory [1, ...]. At the end of the message, it shall pad the remaining memory address values up to [63] with ASCII spaces.  

2. The other difference from Program 2 is that a few of the encrypted message characters may have one bad (flipped) bit. (Remember bit [7], our global parity? This is for error detection.) As you load each inoming message character, check its lower 7 bits for consistency with its highest bit ([7]). Half of the 256 possible 8-bit values will be wrong, whereas the others will be correct. If you detect a corrupt character, insert an error flag, 0x80, into the corresponding output character stream.

3. Note that your device shall run all three programs in succession, controlled by a 4-bit interface with the test bench:

clk: digital clock (generated in test bench, input to your device)

rst: master reset (generated in test bench, input to your device -- in particular, puts your program counter at starting instruction address, most likely 0)

req: request device to start next program (generated in test bench, input to your device)

ack: "program done" flag (output generated by your device, tells test bench "check my work and then ask for the next program")

Note in particular that your device needs to bring the acknowledge / ack signal high right after it completes each program

Appendix 1. ASCII table

 

Appendix 2. 8-tap LFSR schematic -- for feedback pattern 7, 5, 4, 3 (numbering 0 to 7 instead of 1 to 8)

 

This pattern can be represented as 0xb8 (why?). The full list of patterns for an 8-bit LFSR is:

                       (0x) e1, d4, c6, b8, b4, b2, fa, f3

For a 7-bit, 0-indexed LFSR, the maximal length feedback positions are [6,5] = 0x60, [6,3], [6,5,4,3], [6,5,4,1], [6,5,3,1], [6,5,3,0], [6,4,3,2], [6,5,4,3,2,1], and [6,5,4,3,1,0]

The testbench will randomly assign both a maximal length feedback pattern and a nonzero starting state to your encrypter. Your decrypter then needs to decode the message properly for any given starting state and feedback pattern.

As discussed in class, there are 9 possible feedback tap patterns for a maximal-length (all 127 nonzero states covered) length-7 LFSR with either 2, 4, or 6 active taps, and we can ask you to encrypt using any one of the 9 and to decrypt a message that was encrypted with an (unknown) one of the 9 LFSR patterns.

Lab 1 flow, showing loads and stores:

LFSR:

1) LOAD LFSR(0) = dat_mem[63]

2) LOAD taps = dat_mem[62]

3) LFSR(i+1) = LFSR_function(LFSR(i))

        where LFSR function = (input<<1)|(^(input & taps))

Source data:

1) LOAD N = dat_mem[61]

2)  for first N cycles:

source(i) = 0x20 - 0x20 = 0

3)  for remaining cycles, starting w/ i=N:

source(i) = dat_mem[i-N] - 0x20

Destination data:

1) dest(i)[6:0]  = source(i) ^ LFSR(i) + 0x20

2) dest(i)[7] = ^dest(i)[6:0]

3) STORE: dat_mem[64+i] = dest(i)