sT: A Simple T uring Programming Language Programming Assignment 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
sT: A Simple Turing Programming Language
Programming Assignment 1
Lexical Definition
Due Date: 1:20PM, May 4, 2023
Your assignment is to write a scanner for the sT language in lex. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments.
Your programming assignments are based around this division and later assignments will use the parts of the system you have built in the earlier assignments. That is, in the first assignment you will implement the scanner using lex, in the second assignment you will implement the syntactic definition in yacc, and in the last assignment you will generate assembly code for the Java Virtual Machine by augmenting your yacc parser.
This definition is subject to modification as the semester progresses. You should take care in implemen- tation that the programs you write are well-structured and easily changed.
1 Character Set
sT programs are formed from ASCII characters. Control characters are not used in the definition of the language.
2 Lexical Definitions
Tokens are divided into two classes: tokens that will be passed to the parser and tokens that will be discarded by the scanner (i.e. recognized but not passed to the parser).
2.1 Tokens That Will Be Passed to the Parser
The following tokens will be recognized by the scanner and will be eventually passed to the parser:
Delimiters
Each of these delimiters should be passed back to the parser as a token.
dot comma colon semicolon parentheses square brackets brackets |
. , : ; ( ) [ ] { } |
Arithmetic, Relational, and Logical Operators
Each of these operators should be passed back to the parser as a token.
addition subtraction multiplication division remainder assignment relational logical |
+ - * / mod := < <= >= > = not= and or not |
Keywords
The following keywords are reversed words of sT (Note that the case is significant):
array begin bool char const decreasing default do else end exit false for function get if int loop of put procedure real result return skip string then true var when
Each of these keywords should be passed back to the parser as a token.
Identifiers
An identifier is a string of letters and digits beginning with a letter. Case of letters is relevant, i.e. ident, Ident, and IDENT are different identifiers. Note that keywords are not identifiers.
Numerical Constants
A numerical constant is a sequence of one or more digits and optionally followed by a dot and a sequence of one or more digits.
String Constants
A string constant is a sequence of zero or more ASCII characters appearing between double-quote (”) delimiters. A double-quote appearing with a string must be written twice. For example, ”aa””bb” denotes the string constant aa”bb.
2.2 Tokens that will be discarded
The following tokens will be recognized by the scanner, but should be discarded rather than passing back to the parser.
Whitespace
A sequence of blanks (spaces), tabs, and newlines.
Comments
Comments can be denoted in two ways:
• C-style is text surrounded by “{%” and “%}” delimiters, which may span more than one line;
• C++-style is text following a “%” delimiter running up to the end of the line.
Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example
% a comment % line %} {% with {% delimiters %} before the end and
{% this is a comment % line with some {% and
% delimiters %}
are both valid comments.
3 Symbol Tables
You must implement symbol tables to store all identifiers. Symbols tables should be designed for efficient insertion and retrieval operations, and hence they are usually organized as hash tables. In order to create and manage the tables, at least the following functions should be provided:
create(): Creates a symbol table.
lookup(s): Returns index of the entry for string s, or nil if s is not found.
insert(s): Inserts s into a new entry of the symbol table and returns index of the entry. dump( ): Dumps all entries of the symbol table. returns index of the entry.
4 Implementation Hints
You should write your scanner actions using the macros token, tokenInteger, and tokenString. The macro tokenInteger is used for tokens that return an integer value as well as a token (e.g. integer constants), while tokenString is used for tokens that return a string as well as a token. The macro token is used for all other tokens. The first argument of all three macros is a string. This string names the token that will be passed to the parser. The macro tokenInteger takes a second argument that must be an integer and tokenString takes a second argument that must be a string. Following are some examples:
Token |
Lexeme |
Macro Call |
left parenthesis |
( |
token(’(’) |
IF |
if |
token(IF) |
identifier |
ab123 |
tokenString(identifier, ”ab123”); tokenString(identifier, yytext); |
integer constant |
23 |
tokenInteger(integer, ”23”); |
boolean constant |
true |
token(TRUE); |
5 What Should Your Scanner Do?
Your goal is to have your scanner print each token on a separate line, surrounded by angle brackets. Each line should be listed along with a line number. In addition, the identifiers that are stored in the symbol table must be dumped. For example, given the input:
% print hello world
put "hello world"
Your scanner should output:
1: % print hello world
2: put "hello world"
Symbol Table:
6 lex Template
This template may be found online in compiler/lextemplate.l.
%{
#define MAX_LINE_LENG 256
#define LIST strcat(buf,yytext)
#define token(t) {LIST; printf("<%s>\n","t");}
#define tokenInteger(t,i) {LIST; printf("<%s:%d>\n","t",i);} #define tokenString(t,s) {LIST; printf("<%s:%s>\n","t",s);}
int linenum = 0;
char buf[MAX_LINE_LENG];
%}
%%
"("
\n
[ \t] *
.
%%
{token(’(’);}
{
LIST;
if (Opt_L)
printf("%d: %s", linenum, buf);
linenum++;
buf[0] = ’\0’;
}
{LIST;}
{
LIST;
printf("%d:%s\n", linenum+1, buf);
printf("bad character:’%s’\n",yytext);
exit(- 1);
}
2023-05-28
Lexical Definition