CS 2210 Programming Project (Part I)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS 2210 Programming Project (Part I)
Lexical Analyzer
Due date
Token specification
Figure 1 defines the tokens that must be recognized, with their associated symbolic names. All multi-symbol tokens are separated by blanks, tabs, newlines, comments or delimiters.
Comments are enclosed in /* ... */ and cannot be nested. An identifier is a sequence of (upper or lower case) letters or digits, beginning with a letter. Upper and lower case are not distinguished (i.e. the identifier ABC is the same as Abc. There is no limit on the length of identifiers. However, you may impose limits on the total number of distinct identifiers and string lexemes and on the total number of characters in all distinct identifiers and strings taken together. If defined, these
There should be no other limitation on the number of lexemes that the lexical analyzer will process.
An integer constant is an unsigned sequence of digits representing a base 10 number. A string constant is a sequence of characters surrounded also by single quotes, e.g. ’Hello, world’. Hard-to-type or invisible characters can be represented in character and string constants by escape sequences; these sequences look like two characters, but represent only one. The escape sequences support by the MINI-JAVA language are \n for newline, \t for tab, \ 0 for the single quote and \\ for the backslash. Any other character following a backslash is not treated as escape sequence.
Token attributes
A unique identification of each token (integer aliased with the symbolic token name) must be returned by the lexical analyzer. In addition, the lexical analyzer must pass extra information about some token to the parser. This extra information is passed to the parser as a single value, namely an integer, through a global variable as described below. For integer constants, the numeric value of the constant is passed. In order to allow other passes of the compiler to access the original identifier lexeme the lexical analyzer passes an integer uniquely identifying an identifier (other than reserved words). String constants are treated in the same way, with a unique identifying number being passed. The unique identifying number for both identifiers and string constants should be an index (pointer) into a string table created by the lexical analyzer to record the lexemes. Same identifiers should return the same index.
Implementation
In the case of identifiers and string constant, yylval contains a pointer pointing to a string table that contains the real string. The same index should be returned for the same identifier that appear at different places. Similarly the same index are returned for the same string. However, abc and “abc” should return different indice in the string table.
Reserved words may be handled as regular expressions or stored as part of the id table. For example, reserve words may be pre-stored in the string table so your program can determine a reserve word from an identifier by the section of the table in which the lexeme is found. Efficiency should be a factor in the management of the lexical and string table.
You are to write a routine ReportError that takes a message and line and column numbers and reports an error, printing the message and indicating the position of the error. You need only print the line and column number to indicate the position.
The #define mechanism should be used to allow the lexical analyzer to return token numbers symbolically. In order to avoid using token names that are reserved or significant in C or in the parser, the token names have been specified for you in Figure 1.
The parser and the lexical analyzer must agree on the token number to ensure correct communication between them. The token numbers can be chosen by you, as the compiler writer, or, by default, by Yacc (a parser generator to be used in
Temporary driver
{... while (1) {
switch (yylex()) {
case ICONTnum: ...
}
}
...}
Error handling
An example program with output
The program /* Example 1: A hello world program */
program xyz;
method void main() {
System.println(’Hello World !!!’);
}
Assignment submission
2023-09-30