G5035 Compilers and Computer Architecture 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
G5035
Compilers and Computer Architecture
2021
1. This question is about regular expressions.
(a) For each of the following statements about FSAs (= finite state automata), state whether they are true or false. Correct answers are awarded 3 marks; wrong answers are penalised by 1 mark (to a minimum of 0 marks for this question). Omitted answers do not contribute positively or negatively to your grade. [12 marks]
i. The empty string is in the language of an FSA only if the initial state is an accepting state.
ii. The FSA depicted below is deterministic.
|
0
1 |
0
0
1 |
iii. The language accepted by the FSA above is 0+.
iv. The language of a deterministic FSA with exactly one state is either the empty set, or an infinite set.
(b) In the following question, draw diagrams of finite state automata (= FSAs) with the properties described.
i. Draw a diagram of an FSA which accepts the empty string, but whose initial state is not an accepting state. [2 marks]
ii. Draw a diagram of a deterministic FSA which accepts the language LANG(0001|(0|1)* 0). Please provide a loose description for your construction. [12 marks]
iii. Draw a diagram of a deterministic FSA which accepts the language LANG((0|1)* 1(0|1)(0|1)). Please provide a loose description for your construction. [12 marks]
iv. Draw a diagram of an FSA which has only one state, and which accepts only finitely many strings. [4 marks]
(c) Let A be the alphabet {α← β}. For each of the following statements about regular expressions (over A, unless otherwise specified), state whether they are true or false. Correct answers are awarded 1 marks; wrong answers are penalised by 1 mark (to a minimum of 0 marks for this
question). Omitted answers do not contribute positively or negatively to your grade. [8 marks]
i. The string ββ is in the language of (βββ)* (α|β)+.
ii. The empty string is in the language of α|β * .
iii. The string αβααα is in the language of (α|β)* α(α|β)(α|β)(α|β)(α|β).
iv. The string αααββββ is in the language of α+α+β +β +β +β + .
v. αα(α|β)+ is a regular expression of the alphabet {α← β← 礻← 6}.
vi. A deterministic FSA cannot accept the language of any regular expression which involves the empty string e.
vii. Let R be a regular expression over A. Then the language of R* must be infinite.
viii. Let R be a regular expression over A. Then the languages of R and RR must be distinct.
2. This question is about syntactic analysis.
(a) For a CFG (= context free grammar) g = (A← V← s← R), recall that a one- step reduction 5i = 5i+1 is a transformation of a string 5i over alphabet characters and variables, in which a variable symbol x in 5i is replaced by a string 5 such that x → 5 is a reduction in R.
i. Define parse trees, by answering the following questions: [10 marks] ❼ What is the root of the tree?
❼ How is the tree constructed?
❼ What property do the leaves in the tree have?
❼ How does the parse tree represent a string in lang(g)?
ii. Briefly define an ambiguous grammar. [2 marks]
(b) Consider the following grammar,
E → E + E │ E * E │ NUM
NUM → (a number)
and the following derivation from this grammar:
E = E * E = E + E * E
= E + NUM * E = E + NUM * NUM = NUM + NUM * NUM = NUM + 3 * NUM = NUM + 3 * 4 = 2 + 3 * 4
i. Construct the parse tree for the derivation. [8 marks]
ii. If we interpret this string as evaluating a formula of arithmetic operations on integers in the usual way, what value should we
assign this string based on this derivation? [2 marks]
iii. Write the corresponding right-most derivation. [4 marks]
(c) Consider the following fragment of a programming language.
STMT → IFSTMT │ PROC │ . . .
IFSTMT → if EXPR then STMT else STMT │ . . . PROC → IDEN ( ) │ IDEN ( ARGS )
ARGS → EXPR │ EXPR , ARGS
EXPR → NUM │ STRING │ EXPR ← EXPR │ . . .
IDEN → (an identifier consisting of alphabetic characters) NUM → (a number)
STRING → “(any string not including quotation marks) ”
Write pseudocode to describe data types for an AST (= abstract syntax tree) for this part of the language, and instantiate them in such a way to represent the following program fragment.
if 2 < 3 then print "yes" else print "no"
In your solution, indicate which of the objects (among the ones that you initialise) represents the root of the AST. [24 marks]
3. This question is about semantic analysis.
(a) Describe the role of the symbol table.
i. Provide a brief description of the information which may be stored in a symbol table. [4 marks]
ii. What function does a symbol table provide, that the context free grammar of a programming language lacks? [6 marks]
— The remaining two parts of this question are about type-checking for a programming language with the following grammar:
P ::= ID │ BOOL │ INT │ ID ( P ) │ ( P ) │ P < P │ P = P │ P + P │ P ; P
│ if P then P else P │ def ID ( ID : α ) = P
Here, ID, BOOL, and INT, and should be interpreted as variables which expand respectively to identifiers with purely alphabetic names (such as “x” and “fibonacci”); the constants true and false; and non-negative integers (such as “0”, “ 1”, and “290”). Types for this language are given by the following
CFG:
α ::= decl │ bool │ int │ bool -> α │ int -> α
Here, decl is the type which is assigned to function declaration statements def f ( x: α ) = . . . (as opposed to the function f defined by such a statement), and bool and int are the types of boolean and integer expressions respectively. For types α and β, the type α -> β is the type of a function taking input of type α and producing a result of type β .
We assume the following interpretation of program fragments:
❼ f ( Q ), for an identifier f of type α -> β and Q of type α, evaluates a function f on an argument Q, producing a value of type β .
❼ ( Q ) is a way to group an expression to favour a particular order of
evaluation, and has the same type as Q.
❼ Q < R is a boolean expression on integers which evaluates to true precisely when the value expressed by Q is less than that of R.
❼ Q = R is a boolean expression on integers which evaluates to true precisely when the values expressed by Q and R are equal.
❼ Q + R is an integer expression, consisting of the sum of the values of two integer expressions Q and R.
❼ Q ; R is the sequential composition of Q and R : first Q is evaluated, and when this evaluation terminates, R is evaluated (so that the type of Q ; R is the same as that of R).
❼ if E then ○ else R is an if statement, which (provided that ○ and R have the same type, and that T is of type bool) evaluates ○ if T is true and evaluates R if T is false.
❼ def j ( α : α ) = ○ is a function declaration, where the declaration itself has type decl, and which defines a function j taking an argument α of type α and evaluating ○ .
(b) In this language, consider the following (possibly invalid) program: if if 0=1 then 2 else 3 = if 4<5 then 6 else 7 then 8 else 9
Explain either how this program could possibly pass type-checking and be assigned a type, or why this program cannot possibly pass type- checking or be assigned a type. Describe any syntactical issues that play a role in your answer. [10 marks]
(c) For the language described above, provide inference rules of the form
Γ 上 ○ :Γ(α)l上 上 R : αk e.g. ← Γ 上 Γ(○) 上(:) bo(上)ol(R) : int
which determine the types of each of the following reductions for P (i.e., for the following statement types), where Γ ranges over all symbol tables.
i. Write the rules of inference for boolean and integer constants. [4 marks]
ii. Write the rules of inference for order comparisons ○ < R and additions ○ + R. [2 marks]
iii. Write the rules of inference for parenthesized expressions ( ○ ) and sequences of statements ○ ; R. [6 marks]
iv. Write the rules of inference for function evaluation j ( ○ ) and if statements if E then ○ else R. [10 marks]
v. Write the rule of inference for function declaration, def j ( α:α ) = ○ . [8 marks]
Use greek letters / the letters s, T, U , etc. to indicate generic types, other capital letters P , ○ , E , etc. to indicate sub-expressions, and lower- case letters to indicate generic identifier names.
2022-07-15