CSE 340 SPRING 2022 – Project 4
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CSE 340 SPRING 2022 – Project 4
2022
Abstract
The goal of this project is to give you some hands-on experience with implementing a small compiler. You will write a compiler for a simple language. You will not be generating assembly code. Instead, you will generate an intermediate representation (a data structure that represents the program). The execution of the program will be done after compilation by interpreting the generated intermediate representation.
1 Introduction
You will write a small compiler that will read an input program and represent it in a linked list. A node of the linked list represents one instruction. An instruction node specifies: (1) the type of the instruction, (2) the operand(s) of the instruction (if any) and, for jump instructions, the next instruction to be executed (the default is for the next instruction in the list to be executed). After the list of instructions is generated by your compiler, your compiler will execute the generated list of instructions by interpreting it. This means that the program will traverse the data structure and, at every node it visits, it will “execute” the node by changing the content of memory locations corresponding to operands and the proceed to the next instruction to be executed after determining what that instruction should be. This process continues until there is no next instruction to execute. We have provided the code to execute the intermediate representation, but you should understand what the provided code expects from your intermediate representation.
These steps are illustrated in the following figure
The remainder of this document is organized into the following sections:
1. Grammar Defines the programming language syntax including grammar.
2. Execution Semantics Describe statement semantics for assignment, input, if, while, switch, for and output statements.
3. How to generate the linked list of instructions Explains how to generate the interme- diate representation (data structure). You should read this sequentially and not skip around. because it is explained in an incremental manner.
4. Requirements Lists other requirements.
5. Grading Describes the grading scheme.
2 Grammar
The grammar for this project is the following:
program |
→ var section body inputs |
var section |
→ id list SEMICOLON |
id list |
→ ID COMMA id list | ID |
body |
→ LBRACE stmt list RBRACE |
stmt list |
→ stmt stmt list | stmt |
stmt |
→ assign stmt | while stmt | if stmt | switch stmt | for stmt |
stmt |
→ output stmt | input stmt |
assign stmt |
→ ID EQUAL primary SEMICOLON |
assign stmt |
→ ID EQUAL expr SEMICOLON |
expr |
→ primary op primary |
primary |
→ ID | NUM |
op |
→ PLUS | MINUS | MULT | DIV |
output stmt |
→ output ID SEMICOLON |
input stmt |
→ input ID SEMICOLON |
while stmt |
→ WHILE condition body |
if stmt |
→ IF condition body |
condition |
→ primary relop primary |
relop |
→ GREATER | LESS | NOTEQUAL |
switch stmt |
→ SWITCH ID LBRACE case list RBRACE |
switch stmt |
→ SWITCH ID LBRACE case list default case RBRACE |
for stmt |
→ FOR LPAREN assign stmt condition SEMICOLON assign stmt RPAREN body |
case list |
→ case case list | case |
case |
→ CASE NUM COLON body |
default case |
→ DEFAULT COLON body |
inputs |
→ num list |
num list |
→ NUM |
num list |
→ NUM num list |
Some highlights of the grammar:
1. Division is integer division and the result of the division of two integers is an integer.
2. Note that if stmt does not have else.
3. Note that for has a very general syntax similar to that of the for loop in the C language
4. Note that the input and output keywords are lowercase, but other keywords are all upper- case.
5. condition has no parentheses.
6. There is no type specified for variables. All variables are int by default.
3 Variables and Locations
The var section contains a list of all variable names that can be used by the program. For each variable name, we associate a unique locations that will hold the value of the variable. This association between a variable name and its location is assumed to be implemented with a function location that takes a variable name (string) as input and returns an integer value. The locations where variables will be stored is called mem which is an array of integers. Each variable in the program should have a unique entry (index) in the mem array. This association between variable names and locations can be implemented with a location table.
As your parser parses the input program, it allocates locations to variables that are listed in the var section. You can assume that all variable names listed in the var section are unique. For each variable name, a new location needs to be associated with it and the mapping from the variable name to the location needs to be added to the location table. To associate a location with a variable, you can simply keep a counter that tells you how many locations have been used (associated with variable names). Initially the counter is 0. The first variable to be allocated a location will get the location whose index is 0 (mem[0]) and the counter will be incremented to become 1. The next variable to be associated a location will get the location whose index is 1 and the counter will be incremented to become 2 and so on.
4 Inputs
The list of input values is called inputs and appears as the last section of the input to your compiler. This list must be read by your compiler and stored in an inputs array, which is simply a vector of integers.
5 Execution Semantics
All statements in a statement list are executed sequentially according to the order in which they appear. Exception is made for the bodies of if stmt, while stmt, switch stmt, and for stmt as ex- plained below. In what follows, I will assume that all values of variables as well as constants are stored in locations. This assumption is used by the execution procedure that we provide. This is not a restrictive assumption. For variables, you will have locations associated with them. For con- stants, you can reserve a location in which you store the constant (this is like having an unnamed immutable variable).
2022-04-08