G5035 Compilers and Computer Architecture August 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
BSc and MComp SECOND YEAR EXAMINATION
Compilers and Computer Architecture
G5035
August 2021
1. This question is about regular expressions.
(a) For each of the following statements about FSAs (= finite state automata), state whether they are true or false. Correct answers are awarded 3 marks; wrong answers are penalised by 1 mark (to a minimum of 0 marks for this question). Omitted answers do not contribute positively or negatively to your grade.
i. The empty string is in the language of an FSA only if the initial state is an accepting state.
ii. The FSA depicted below is deterministic.
iii. The language accepted by the FSA above is 0+.
iv. The language of a deterministic FSA with exactly one state is either the empty set, or an infinite set.
(b) In the following question, draw diagrams of finite state automata (= FSAs) with the properties described.
i. Draw a diagram of an FSA which accepts the empty string, but whose initial state is not an accepting state.
ii. Draw a diagram of a deterministic FSA which accepts the language LANG(0001|(0|1)* 0). Please provide a loose description for your construction.
iii. Draw a diagram of a deterministic FSA which accepts the language LANG((0|1)* 1(0|1)(0|1)). Please provide a loose description for your construction.
iv. Draw a diagram of an FSA which has only one state, and which accepts only finitely many strings.
(c) Let A be the alphabet {8, y}. For each of the following statements about regular expressions (over A, unless otherwise specified), state whether they are true or false. Correct answers are awarded 1 marks; wrong answers are penalised by 1 mark (to a minimum of 0 marks for this question). Omitted answers do not contribute positively or negatively to your grade.
i. The string yy is in the language of (yyy)* (8 |y)+.
ii. The empty string is in the language of 8 |y * .
iii. The string 8y888 is in the language of (8 |y)* 8(8 |y)(8 |y)(8 |y)(8 |y) .
iv. The string 888yyyy is in the language of 8+8+y+y+y+y+ .
v. 88(8 |y)+ is a regular expression of the alphabet {8, y, o, 6}.
vi. A deterministic FSA cannot accept the language of any regular expression which involves the empty string e.
vii. Let R be a regular expression over A. Then the language of R* must be infinite.
viii. Let R be a regular expression over A. Then the languages of R and RR must be distinct.
2. This question is about syntactic analysis.
(a) For a CFG (= context free grammar) g = (A, V, s, R), recall that a one- step reduction s乞 = s乞+1 is a transformation of a string s乞 over alphabet characters and variables, in which a variable symbol x in s乞 is replaced by a string s such that x → s is a reduction in R.
i. Define parse trees, by answering the following questions:
❼ What is the root of the tree?
❼ How is the tree constructed?
❼ What property do the leaves in the tree have?
❼ How does the parse tree represent a string in lang(g)?
ii. Briefly define an ambiguous grammar.
(b) Consider the following grammar,
E → E + E │ E * E │ NUM
NUM → (a number)
and the following derivation from this grammar:
E = E * E = E + E * E
= E + NUM * E = E + NUM * NUM = NUM + NUM * NUM = NUM + 3 * NUM = NUM + 3 * 4 = 2 + 3 * 4
i. Construct the parse tree for the derivation.
ii. If we interpret this string as evaluating a formula of arithmetic operations on integers in the usual way, what value should we
assign this string based on this derivation?
iii. Write the corresponding right-most derivation.
(c) Consider the following fragment of a programming language.
STMT → IFSTMT │ PROC │ . . .
IFSTMT → if EXPR then STMT else STMT │ . . . PROC → IDEN ( ) │ IDEN ( ARGS )
ARGS → EXPR │ EXPR , ARGS
EXPR → NUM │ STRING │ EXPR < EXPR │ . . .
IDEN → (an identifier consisting of alphabetic characters) NUM → (a number)
STRING → “(any string not including quotation marks) ”
Write pseudocode to describe data types for an AST (= abstract syntax tree) for this part of the language, and instantiate them in such a way to represent the following program fragment.
if 2 < 3 then print "yes" else print "no"
In your solution, indicate which of the objects (among the ones that you initialise) represents the root of the AST.
3. This question is about semantic analysis.
(a) Describe the role of the symbol table.
i. Provide a brief description of the information which may be stored in a symbol table.
ii. What function does a symbol table provide, that the context free grammar of a programming language lacks?
— The remaining two parts of this question are about type-checking for a programming language with the following grammar:
Here, ID, BOOL, and INT, and should be interpreted as variables which expand respectively to identifiers with purely alphabetic names (such as “x” and “fibonacci”); the constants true and false; and non-negative integers (such as “0”, “ 1”, and “290”). Types for this language are given by the following
CFG:
8 ::= decl │ bool │ int │ bool -> 8 │ int -> 8
Here, decl is the type which is assigned to function declaration statements def f ( x:8 ) = . . . (as opposed to the function f defined by such a statement), and bool and int are the types of boolean and integer expressions respectively. For types 8 and y, the type 8 -> y is the type of a function taking input of type 8 and producing a result of type y .
We assume the following interpretation of program fragments:
❼ f ( ○ ), for an identifier f of type 8 -> y and ○ of type 8, evaluates a function f on an argument ○, producing a value of type y .
❼ ( ○ ) is a way to group an expression to favour a particular order of
evaluation, and has the same type as ○ .
❼ ○ < R is a boolean expression on integers which evaluates to true precisely when the value expressed by ○ is less than that of R.
❼ ○ = R is a boolean expression on integers which evaluates to true precisely when the values expressed by ○ and R are equal.
❼ ○ + R is an integer expression, consisting of the sum of the values of two integer expressions ○ and R.
❼ ○ ; R is the sequential composition of ○ and R : first ○ is evaluated, and when this evaluation terminates, R is evaluated (so that the type of ○ ; R is the same as that of R).
❼ if E then ○ else R is an if statement, which (provided that ○ and R have the same type, and that T is of type bool) evaluates ○ if T is true and evaluates R if T is false.
❼ def f ( α :8 ) = ○ is a function declaration, where the declaration itself has type decl, and which defines a function f taking an argument α of type 8 and evaluating ○ .
(b) In this language, consider the following (possibly invalid) program: if if 0=1 then 2 else 3 = if 4<5 then 6 else 7 then 8 else 9
Explain either how this program could possibly pass type-checking and be assigned a type, or why this program cannot possibly pass type- checking or be assigned a type. Describe any syntactical issues that play a role in your answer.
(c) For the language described above, provide inference rules of the form
which determine the types of each of the following reductions for P (i.e., for the following statement types), where Γ ranges over all symbol tables.
i. Write the rules of inference for boolean and integer constants.
ii. Write the rules of inference for order comparisons ○ < R and additions ○ + R.
iii. Write the rules of inference for parenthesized expressions ( ○ ) and sequences of statements ○ ; R.
iv. Write the rules of inference for function evaluation f ( ○ ) and if statements if E then ○ else R.
v. Write the rule of inference for function declaration, def f ( α:8 ) = ○ .
Use greek letters / the letters s , T, U , etc. to indicate generic types, other capital letters P , ○ , E , etc. to indicate sub-expressions, and lower- case letters to indicate generic identifier names.
2022-01-07