Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CIT 593 – Module 11 Assignment

Making the LC4 Assembler Instructions


Assignment Overview

From lecture you’ve learned that C is file-oriented and that working with files represents I/O devices in C.

C files fall into two categories: "text" and "binary". In this assignment you’ll  work with both types by reading in a text file and writing out a binary file.

You will read an arbitrary  .asm file (a text file intended to be read by PennSim) and write a  .obj file (the same type of binary file that PennSim would write out).

Aside from reading and writing out the files, your task will be to make a mini-LC4- Assembler! An assembler is a program that reads in assembly language and generates its machine equivalent.

This assignment will require a bit more programming rigor than we’ve had thus far, but now that you’ve gained a good amount of programming skill in this class and in others, it is the  perfect time to tackle a large programming assignment (which is why the instructions are so many pages).

Learning Objectives

This assignment will cover the following topics:

●    Review the LC4 Object File Format

●    Read text files and process binary files

   Assemble LC4 programs into executable object files

●    Use debugging tools such as GDB

Advice

●   Start early

   Ask for help early

●    Do not try to do it all in one day


Getting Started

Codio Setup

Open the Codio assignment via Canvas.  This is necessary to link the two systems.

You will see many directories and files.  At the top-level workspace directory, the mail files are asm_parser.h, asm_parser.c, assembler.c, and PennSim.jar.

Do not modify any of the directories or any file in any of the directories.

Starter Code

We have provided a basic framework and several function definitions that you must implement.

assembler.c


- must contain your main function.

asm_parser.c

- must contain your asm_parser functions.

asm_parser.h

- must contain the definition for ROWS and COLS

- must contain function declarations for read_asm_file, parse_instruction, parse_reg, parse_add, parse_mul, str_to_bin, write_obj_file, and any helper function you implement in asm_parser.c

test1.asm

- example assembly file

PennSim.jar

- a copy of PennSim to check your assembler


Object File Format Refresher

The following is the format for the binary  .obj files created by PennSim from your  .asm files. It   represents the contents of memory (both program and data) for your assembled LC-4 Assembly programs. In a .obj file, there are 3 basic sections indicated by 3 header “types” = Code , Data, and Symbol:

   Code: 3-word header (xCADE, <address>, <n>), n-word body comprising the instructions.

○   This corresponds to the .CODE directive in assembly.

●    Data: 3-word header (xDADA, <address>, <n>), n-word body comprising the initial data values.

○   This corresponds to the .DATA directive in assembly.

●   Symbol: 3-word header (xC3B7, <address>, <n>), n-character body comprising the symbol string. These are generated when you create labels (such as “END”) in

assembly.  Each symbol is its own section.

○    Each character in the file is 1 byte, not 2 bytes.

○   There is no NULL terminator.


Requirements

General Requirements

●    You MUST NOT change the filenames of any file provided to you in the starter code.

●    You MUST NOT change the function declarations of any function provided to you in the starter code.

●    You MAY create additional helper functions.  If you do, you MUST correctly declare the

functions in the appropriate header file and provide an implementation in the appropriate source file.

●    Your program MUST compile when running the command make.

●    You MUST NOT have any compile-time errors or warnings.

●    You MUST remove or comment out all debugging print statements before submitting.

●    You MUST NOT use externs or global variables.

●    You MAY use string.h, stdlib.h, and stdio.h.

●   You SHOULD comment your code since this is a programming best practice.

●    Your program MUST be able to handle  .asm files that PennSim would successfully assemble. We will not be testing with invalid .asm files.

●    You MUST provide a makefile with the following targets:

 assembler

 asm_parser.o

Assembler

assembler.c: main

   You MUST not change the first four instructions already provided.

   The main function:

○    MUST read the arguments provided to the program.

■   the user will use your program like this: ./assembler test1.asm

   MUST store the first argument into filename.

○   SHOULD print an error1 message if the user has not provided an input filename.

   MUST call read_asm_file to populate program[][].

○    MUST parse each instruction in program[][] and store the binary string equivalent into program_bin_str[][].

○    MUST convert each binary string into an integer (which MUST have the correct value when formatted with "0x%X") and store the value into program_bin[].

○    MUST write out the program into a .obj object file which MUST be loadable by PennSim's ld command.

asm_parser.c: read_asm_file

This function reads the user file.

●    It SHOULD return an error2 message if there is any error opening or reading the file.

●    It MAY try to check if the input program is too large for the defined variables, but we will not be testing outside the provided limits.

●    It MUST read the exact contents of the file into memory, and it MUST remove any newline characters present in the file.

●    It MUST work for files that have an empty line at the end and also for files that end on an instruction (i.e. do not assume there will always be an empty line at the end of the file).

●    It MUST return 0 on success, and it MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 2 on failure).

asm_parser.c: parse_instruction

This function parses a single instruction and determines the binary string equivalent.

●    It SHOULD use strtok to tokenize the instruction, using spaces and commas as the delimiters.

●    It MUST determine the instruction function and call the appropriate parse_xxx helper function.

●    It MUST parse ADD, MUL, SUB, DIV, AND, OR, XOR instructions.

○    It MUST parse ADD IMM and AND IMM if attempting that extra credit.

●    It MUST return 0 on success, and it MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 3 on failure).

asm_parser.c: parse_add

This function parses an ADD instruction and provides the binary string equivalent.

●    It MUST correctly update the opcode, sub-opcode, and register fields following the LC4 ISA.

●    It SHOULD call a helper function parse_reg, but we will not be testing this function.

●    It MUST return 0 on success, and it MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 3 on failure).

asm_parser.c: parse_xxx

You MUST create a helper function similar to parse_add for the other instruction functions required in parse_instruction.

●   They MUST correctly update the opcode, sub-opcode, and register fields following the LC4 ISA.

   They SHOULD call a helper function parse_reg, but we will not be testing this function.

●   They MUST return 0 on success, and they MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return a unique error number on failure).



asm_parser.c: str_to_bin

This function converts a C string containing 1s and 0s into an unsigned short integer

● It MUST correctly convert the binary string to an unsigned short int which can be verified using the "0x%X" format.

● It SHOULD use strtol to do the conversion.

● It MUST return 0 on success, and they MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 6 on failure).

asm_parser.c: write_obj_file

This function writes the program, in integer format, as a LC4 object file using the LC4 binary format.

● It MUST output the program in the LC4 binary format described in lecture and in the Object File Format Refresher section.

● It MUST change the extension of the input file to .obj.

● It MUST use the default starting address 0x0000 unless you are attempting the .ADDR extra credit.

● It MUST close the file with fclose.

● It MUST return 0 on success, and they MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 7 on failure).

● The generated file MUST load into PennSim (and you MUST check this before submitting), and the contents MUST match the .asm assembly program



Extra Credit

You may attempt any, all, or none of these extra credit options.  You MUST test using your own generated examples (we will not provide any).

Option 1: modify your read_asm_file function to ignore comments in .asm files.  You MUST handle all types of comments for credit.

Option 2: modify your program to handle ADD IMM and AND IMM instructions.  Both MUST work completely for credit.

Option 3: modify your program to handle the  .CODE and  .ADDR directives.

Option 4: modify your program to handle the  .DATA and .ADDR directives.

Suggested Approach

This is a suggested approach.  You are not required to follow this approach as long as you follow all of the other requirements.

High Level Overview

Follow these high-level steps and debug thoroughly before moving on to the next. 1.   Initialize all arrays to zero or  '\0'

2.   Call read_asm_file to read the entire  .asm file into the array program[][].

a.   Using test1.asm as an example, after read_asm_file returns: program[][] should then contain:

3.   In a loop, for each row X in program[][]:

a.   Call parse_instruction, passing it the current row in program[X][] as input to parse_instruction. When parse_instruction returns,

program_bin_str[X][] should be updated to have the binary equivalent (in string form).

b.   Call str_to_bin passing program_bin_str[X][] to it. When str_to_bin

returns, program_bin[X] should be updated to have the hexadecimal equivalent of the binary string from program_bin_str[X].

4.   Once the loop is complete program_bin_str[][] should contain program[][] equivalent:



5.  Also after the loop is complete, the array program_bin[] should contain

program_bin_str[][]’s equivalent in binary (formatted in hexadecimal here):

0

0x1201

1

0x1449

2

0x1691

3

0x12DA

4

0x5283

5

0x52D2

6

0x52DA

program_bin[] now represents the completely assembled program. 6.  Write out the .obj file in binary using theLC4 Object File Format.

Great High Level Overview, but I really need a Slightly More Detailed Overview

Okay, I guess we can give some more details.

Part 0: Setup the main Function to Read the Arguments

Open assembler.c from the helper files; it contains the main function for the program.

Carefully examine the variables at the top:

char* filename = NULL ;

char program [ROWS][COLS] ;

char program_bin_str [ROWS][17] ;

unsigned short int program_bin [ROWS] ;

The first pointer variable filename is a pointer to a string that contains the text file you’ll be   reading. Your program must take in as an argument the name of a .asm file. As an example, once you compile your main program, you would execute it as follows:

./assembler test1.asm

In the last assignment you learned how to use the arguments passed into main. So the first

thing to implement is to check argc to see if the program has received any arguments.  If it

does, point filename to the argument that contains the passed in string that is the file’s name. You should return from main immediately after printing an error message if the caller doesn’t    provide an input file name.  For example, something like this:

error1: usage: ./assembler <assembly_file>.asm

Start by updating assembler.c to read in the arguments and store the filename. Compile your changes and test them before continuing.

Part 1: Read the .asm File

The next thing to do is to actually read the file into memory.  main's next call will be

int read_asm_file (char* filename, char program [ROWS][COLS] ) ;

The purpose of read_asm_file is to open the  .asm file, and place its contents into the 2D array program[][]. You must complete the implementation of this function in the provided helper file  asm_parser.c.

Notice that it takes in the pointer to the filename that you’ll open in this function. It also takes in the two dimensional array, program, that was defined back in main.

You’ll also see that ROWS and COLS are two #define’d macros in asm_parser.h.  ROWS is set to 100 and COLS is set to 255. This means that you can only read in a program that is up to 100

lines long and each line of this program can be no longer than 255.  When the program

compiles, the compiler will replace all instances of ROWS with 100 and all instances of COLS with 255.  This means you can #define these values once to avoidMagic Numbersand simplify

your program.

You’ll want to look at the class notes (or a C reference textbook) to use fopen to open the     filename that has been passed in. Then you’ll want to use a function like fgets to read each line of the .asm file into the program[][] 2D array. Be aware that fgets will keep carriage   returns (aka the newline character) and you’ll need to strip these from the input.

Take a look at test1.asm file that was included in the helper file. It contains the following program:

ADD R1, R0, R1

MUL R2, R1, R1

SUB R3, R2, R1

DIV R1, R3, R2

AND R1, R2, R3

OR R1, R3, R2

XOR R1, R3, R2

After you complete read_asm_file and run it on test1.asm, your 2D array program[][] would contain the contents of the  .asm file in this order:

Notice there are no newline characters at the end of these lines.

If reading in the file is a success, return 0 from the function.  If not, return 2 from the function and print an error to the screen:

error2: read_asm_file failed

Implement and test this function carefully before continuing on with the assignment.