Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


CS 352 Fall 2021

Compiler Project 2


1. Introduction

For an overview of all projects in this semester and their relationship, please review Project 1 handout. This project, as all other projects, is to be carried out by each student independently. Please review the course-specific requirements for academic honesty stated in HW0.


2. Main Tasks in Project 2

Project 2 (P2 in short) writes a YACC program to build a bottom-up parser (with the ability of type checking) for a subset of Minijava+ that has been posted on BrightSpace. This subset is defined in the file MiniJavaP2.txt posted on BrightSpace in the module titled Project2. This section gives an overview of the tasks in Project 2. Minute details will be clarified on Campuswire as various questions may be posed by students. For brevity, the parser constructed for this project that also performs type checking is called simply the parser.

Project 2 consists of the following main components.

1. Modify the Lex script written for P1 to extract new tokens, primarily due to the expansion of the Main class. As in P1, some token classes are embedded in the production rules and students need to define them in the Lex script.

2. Write a YACC/Bison script (Bison is a YACC-compatible tool that supports C++) to parse an arbitrary minijavaP2 program. For brevity, we will refer to YACC only throughout the rest of the projects this semester, while students are allowed to use Bison as well.

3. Add semantic actions to the YACC script such that an abstract syntax tree (AST) and a symbol table are built for the input program.

4. Traverse the AST to perform type checking and report type rule violations. (For brevity, we will simply say a type violation or a type error in this document.)

Items 2 to 4 are elaborated further in the following subsections.


2.1 Writing the Grammar

As discussed in the lectures, right-recursive production rules tend to make the parsing stack grow deep, thus potentially stressing the available memory. It is therefore advisable to change right-recursive rules written for P1 into left-recursive one.

More importantly, the production rules need to be written in such a way that the AST constructed by the semantic actions follow the correct precedence rules in Java. Students are strongly advised to write the production rules in such a way that YACC does not complain about any parsing conflicts. We do not prohibit the use of precedence directives in YACC such as %left to resolve parsing conflicts. We also allow students to use YACC’s default parsing actions (i.e., default to shift or to reduce). However, such practice is quite error prone and requires a thorough understanding of the parsing actions that it causes. It is the students’ responsibility to ensure that the existence of parsing conflicts does not cause any failure to correctly process the input programs.

Note that although YACC resolves parsing conflicts according to the specified operator precedence, it will still give warnings about the existence of such conflicts. To encourage students to apply the ideas discussed in the lectures to minimize parsing conflicts, bonus points will be granted for minimizing such conflicts. This will be explained in more details later in this document.


2.2 Semantic Actions and AST Construction

The purpose of constructing the AST for this project is to facilitate type checking. Students may find that many statements in MiniJavaP2 can be type checked without having an AST constructed. However, due to the possibility of forward reference to method names (i.e., calling a method that is defined after the method making such a call), the absence of AST can make type checking difficult to perform for such method invocations. A few of the test cases for grading will include such forward references. Up to 10 out 100 points will be assigned to such test cases. Hence, if the students want to attempt the full marks, building an AST would be strongly advised.

Ad-hoc ways to avoid constructing AST for handling forward references are possible. However, the interpreter built for the upcoming Project 3 (a two-week project) will necessitate an AST. Therefore, students are advised to construct the AST in Project 2 (a three-week project). We note that P3 and P2 will use the same grammar in MiniJavaP2. With the given syntax form in MiniJavaP2 and the type rules explained in Section 2.3, the operation order for expressions in the AST is not so critical for type checking as for program interpretation.


2.3 Type Rules and Type Checking

A correct program targeted by P2 must adhere to the following type rules.

1. A variable must be declared before it is referenced.

2. All type rules must be consistent with Java standard (as specified in Java document https://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.2), except the following additional restrictions:

a. Both operands of any binary operation must have the same prime type allowed for that operation. (We have three prime types, int, boolean, and String.) An operand may be a constant, a variable, the intermediate result of an operation, or a value returned by an invoked method.

b. The left-hand side and right-hand side of an assignment operation must have the same prime type.

c. Logical operations are performed on boolean-typed operands only.

d. IF and WHILE conditions must be of the boolean type.

e. Concatenation (+) operations are performed on String-typed operands only.

f. Array indices must be of int type. To simplify the implementation, we assume that an array is either of one dimension or two dimensions.

3. To simplify the handling of strings, it is assumed that the input program contains no backslash character “\”.

4. System.out.println, System.out.print and Integer.parseInt are standard Java methods. The argument expression in the invocation of the first two standard methods may be any of the prime types listed in MiniJavaP2. Integer.parseInt converts a string to an integer and therefore expects a String argument when invoked.

5. Except for the standard Java methods explicitly mentioned below in this subsection, a method must be declared in the input program in order to be invoked, although forward references of method names are allowed. The argument list in the method invocation must agree with the list of the formals in the corresponding method declaration, including the strict agreement of the prime type for each parameter.

6. MiniJava+ has no type conversion and type casting between prime types. Neither does MiniJava+ allow overloading in method declaration and invocation.


2.4 Reporting Syntax Errors and Type Violations

The test cases for grading this project are divided in two categories, discussed separately below. All error reporting messages, including both syntax errors and type violations, must be written to the standard error file (stderr).


2.4.1. Syntax rule violation reporting

An input program in the first category contains a violation of the production rules in MiniJavaP2. The parser built for P2 must be able to report such a syntax error. The parser should print the following error message for the whole program in the exact format shown below before aborting the processing of the input program:

Syntax errors in <line_so_and_so>

where <line_so_and_so> is the line number where the first syntax error is found. (The students are allowed, but not required, to print any additional error information that may help them debug their project, but such optional additional print out must be on separate lines below the required error message mentioned above.) NOTE: If the input program contains no production rule violations, then nothing should be printed except for any type violations found (see below).


2.4.2. Type rule violation reporting

An input program in the second category contains no violations of the production rules, but it may or may not contain type rule violations. If the parser incorrectly reports a syntax error, it fails the test case no matter what is performed later for type checking for that program. If the input program contains neither syntax errors nor type rule violations, then the parser must print out nothing.

To avoid excessive type violation reporting due to propagation of type errors, we introduce the concept of undefined type. If an operation is found to violate a type rule, then the type of the operation result becomes undefined, unless the operation is an assignment. An assignment operation has its own type definition rule as stated below.

If a variable, say var_foo, is not yet declared a type, then its type is undefined. As soon as var_foo is declared a type, then it keeps the type no matter what type of expression is assigned to it. However, if var_foo is later re-declared a different type in the same name scope, then the type of var_foo becomes undefined. However, if the variable is re-declared the same type, it should still keep the same type.

Any operation that has an undefined operand will not be reported as a type rule violation, unless it is an assignment operation. Assigning to a variable that has not yet been declared is a type rule violation to be reported. A statement that re-declares a variable in the same name scope must also be reported as a type violation.

As an example, suppose var_b is int, var_c is String, var_d is int, and suppose we have two statements int var_a = var_d + var_b * var_c; String var_a;” The operation var_b * var_c is a type violation that must be reported. The type of var_b * var_c is undefined. The operation var_d + var_b * var_c also has an undefined type, but we will not report another type violation. Neither will the assignment to var_a trigger a type violation report. The variable var_a retains its int type until it is re-declared to be String, when the re-declaration must be reported as a type violation.

An input program may contain one or more type rule violations, and the parser must find and report all of them, with the caveat given above about the undefined types.

Your code generated by YACC must not produce any output (written to stderr or stdout) except those specified above. Hence, if your YACC program has additional printing statements added to yyerror() for debugging purpose, you must remove them before submitting the program. Not meeting this requirement will result in point deduction.

Each type rule violation reporting message must be printed on a separate line, in the form of

Type Violation in Line <line_number>

The error message must be printed exactly as specified, except that <line_number> is a line number and the number of spaces (must be > 0) around the printed line number is flexible. As a general rule, the line in which a type violation is found is reported. In some cases, two adjacent lines may be both responsible for a specific type violation. In such cases, set your own rule for reporting one of these adjacent lines. Occasionally yylineno may also deviate slightly from the actual error location slightly, which could be due to the way your production rules are designed. The test cases for grading will be written to minimize ambiguity of line positions. If needed, however, the grader may inspect the submitted code to see if an imprecise line number reported is permitted.

NOTE: Many users have found it highly useful to track the line and column numbers using YYLTYPE as in an online document. (https://www.gnu.org/software/bison/manual/html_node/Tracking-Locations.html). Its explanation in the subsection "Location Default Action" can be modified to detect the number of lines in a multiline comment. Note that from your YACC file you can access the contents of the YYLTYPE by using "@$, @1, @2, ..." in the same way "$$, $1, $2, ..." are accessed.


3. Grading

Each of the two categories of test cases mentioned in the previous section carries 50% of the grades, not counting the possible deduction due to submission requirement violations (see Section 4) and bonus points for resolving parsing conflicts.

The bonus points received for removing parsing conflicts depend on the thoroughness of such removal. The maximum bonus points, 5% of the total points, are granted if no parsing conflicts are reported by YACC. With each remaining parsing conflict reported, the bonus points received is decreased by 1 percentage point, i.e., 4% bonus points if one conflict remaining.


4. Submission instruction

It is mandatory for students to meet the submission instruction. Not following any of the requirement may result in grade deduction, up to a 10 point of penalty. (For example, if your code receives 90 points, the final score on the Brightspace for this assignment will be 80 points when maximum deduction is applied.)

1) Use the following command on CS lab machines that run any Linux OS to submit your homework. No other forms of submission (e.g., email) will be accepted.

turnin –c cs352 –p p2 [name of the directory to be submitted]

Prior to executing the turnin command, it is mandatory for you to make sure your submitted directory meets the following content requirement.

Make sure that your submitted directory contains only the following files: the Makefile, your YACC and Lex programs, and the .h and .c files (or the C++ equivalents) you have written to work with the Lex/YACC source code. This is to ensure that the system does not exceed the storage limit set by the system staff.

2) You are free to write your own Makefile, as long as the following requirements are met.

The grader will run your Makefile by executing the command “make parser” to produce the executable program named parser. Hence, it is important that your Makefile produces such an executable when invoked.

You must also write your Makefile such that when “make clean” is run, your working directory will be cleaned, with the only remaining files being the Makefile, your YACC and Lex programs, and the .h and .c files (or the C++ equivalents) you have written to work with the Lex/YACC source code. This is to allow the grader to easily remove the files generated by the testing after your work is graded.

If you are not familiar with writing of Makefiles, you may wish to use one of the two versions shown below and seek help from the TAs if running into issues. (Note that, usually the command names “yacc” and “bison” are aliases on our lab machines.)

The C version of the Makefile:

  parser:     y.tab.c lex.yy.c

                 gcc y.tab.c lex.yy.c -o parser

  y.tab.c :    parser.y

                 yacc -d -g --verbose parser.y

  lex.yy.c:    parser.l

                  lex parser.l

  clean:

     rm -f lex.yy.c y.tab.c parser

The C++ version:

FLEXFILE = parser.l

BISONFILE = parser.y

all: flex bison parser

parser:

           g++ -g scanner.cpp parser.cpp -o parser

flex:

           flex -o scanner.cpp ${FLEXFILE}

bison:

           bison -o parser.cpp --defines=parser.h ${BISONFILE}

clean:

           rm -f parser.h parser.cpp scanner.cpp parser

NOTE that in these Makefiles, a “tab” is required as the first character on each command line, e.g. “<tab> lex parser.l”. Otherwise you will see an error saying “missing separator”.

Students may wish to run “make clean” first each time they run “make” to rebuild their parser, just to make sure that the “make” tool does not mistakenly think your parser is up to date and does not rebuild.

4) Your program MUST compile and run without any error on CS lab’s Linux machines. Please do a final test of this before submission, especially if you do your project on other computers.

5) Your code will be tested for grading by the following command:

> ./parser program_name

Where “program_name” is the file name containing the input program.


5. Discussion Online

The policies for online discussion and information sharing concerning this project are the same as in Project 1 under the section titled “Discussion Online”. Note that the teaching staff has posted further announcements and clarifications for P1 on CampusWire. These still apply in Project 2.


APPENDIX An Authoritative Reference for Type Checking

The sections above have covered all type rules for P2. If the student wishes to compare with the rules in the full set of Java, the authoritative document for type and run-time evaluation rules for our projects this semester is Java Language Specification (Java SE 7bEdition, abbreviated as JSE7 in the rest of this document). This document can be found at https://docs.oracle.com/javase/specs/jls/se7/html/index.html

This is a large document, but only a small part concerns our Project 2, mainly because MiniJavaP2 is a small subset of Java. The limited syntax form of MiniJavaP2 dictates that its type rules are also limited.

Note that we imposed more strict type rules on MiniJavaP2 than the full set of Java. We understand that students have learned a set of type rules in the introductory Java programming course (CS180), but the JSE7 document formalizes the rules.