G5035 Compilers and Computer Architecture 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
BSc and MComp SECOND YEAR EXAMINATION
Compilers and Computer Architecture
G5035
1. This question is about lexer and parsers, regular expressions and languages, and context-free grammars (CFGs).
(a) For each of the following statements about languages (i.e. sets of words), state whether they are true or false. Correct answers are awarded 2 marks; wrong answers are penalised by 1 mark (to a minimum of 0 marks for this question). Omitted answers do not contribute to the marks. [20 marks]
i. If A* = A, then A = {ϵ}.
ii. For all A and B , (AB)* = A* B* .
iii. For all A, B , C , (A n B)C = (AC) n (BC).
iv. For all A, B , C , (AB) n C = (A n C)(B n C).
v. L = {an bn : 0 ≤ n ≤ 4} is regular.
vi. If α and β are regular expressions, then (αβ)* α = α(βα)* .
vii. L = {w = xyzy : x,y,z ∈ {0, 1}+ } is regular.
viii. If A and B are not regular, then A n B is not regular.
ix. If A and C are regular and A ⊆ B ⊆ C , then B must be regular.
x. If A* is regular, then A is regular.
(b) Explain why we usually use CFGs rather than regular expressions or finite state automata to specify the syntactic structures of programming languages (as opposed to the lexical structures of programming languages). [5 marks]
(c) The lexer and parser of a compiler can be written “by hand” . What is a popular alternative to hand-writing lexer and parser? List the advantages and shortcomings of either approach. [10 marks]
(d) Consider the following context-free grammar:
value
: ’IF’ expr ’THEN’ expr ’ELSE’ expr
| ’IF’ expr ’THEN’ expr
| expr
;
expr
: expr ’+’ term
| term
;
term
: ’a’ | ’b’ | ’c’ | ’d’ | ’e’
;
Ignoring whitespaces, draw a parse tree for
IF a THEN b + c + d ELSE e [5 marks]
(e) The grammar has several undesirable features—what are they? Illustrate them by examples as necessary, and explain why they are problematic. [5 marks]
(f) What could be done to overcome the problems in the previous question? [5 marks]
2. This question is about code generation.
(a) What data structure is used as input to code generators? What feature of it makes the implementation of code generators simple and efficient? [5 marks]
(b) Some compilers have a semantic analysis phase after lexing and parsing but before code generation. What is the purpose of this phase, and why is semantic analysis not done in the parsing stage? What about languages that don’t have semantic analysis at all (e.g. Python or JavaScript)? [5 marks]
(c) Explain the difference between type checking and type inference. [10 marks]
(d) Many compilers don’t generate machine code directly after lexing, parsing and semantic analysis. Instead they first generate machine-independent code, which is later, possibly after optimisation, used to generate code for the target architecture. Explain the characteristics of machine-independent code, and the advantages of using them as an intermediate layer in compilation. [10 marks]
(e) Describe a scheme for the translation of function definitions and function calls into low-level code (e.g., RISC-V or stack machine code) based on a stack, as discussed in the lectures and the third task of the coursework. Assume that in the source language function calls are of the form
f(e1, . . . , en)
where e1, . . . , en are expressions, function definitions are of the form
f(x1, . . . , xn) = BLOCK
where BLOCK itself is an expression, and any expression evaluates to a 32- bit (i.e. 4-byte) integer. We also assume that code generation for other kinds of expressions is readily available, e.g., one can use genExp(BLOCK) and genExp(e1) directly. If you use uncommon instructions, please explain their semantics briefly. If you use activation records in your explanation, describe the layout of your activation records. Note that your answer comprises two parts: code generation for function definitions, and code generation for function calls. [20 marks]
3. This question is about computer architecture and low-level programming.
(a) Compare and contrast CISC with RISC architectures. [10 marks]
(b) What is the meaning of data locality in the execution of a program? Give an example of a program fragment that exhibits a high degree of data locality. Explain why your example exhibits data locality. [10 marks]
(c) Real CPUs have a small number of fast registers, typically between 8 and 256. Explain why this small number of registers makes compilation more complicated in comparison with compilation to stack machines. Outline a compilation strategy for machines with a finite number of registers. You don’t have to present a full code generator, just explain the key features and ideas. [10 marks]
(d) In this question, you will write RISC-V assembly programs for two different tasks. Your programs may only use the registers x0, a0, t0, t1, and sp, and the following RISC-V assembly commands:
lw register1 offset (register2 ) load a word of data at offset from an address stored inregister2 , intoregister1 .
sw register1 offset (register2 )
store the value of register1 into memory at offset from an address stored in register2 .
add |
register1 |
register2 |
register3 |
add the values stored in register3 and register2 , and store the result in register1 . |
addi |
register1 |
register2 |
constant |
add the signed 16-bit value constant to the value stored in register2 and and store the result in register1 . |
mul |
register1 |
register2 |
register3 |
multiply the values stored in register3 and register2 , and store the (lower 32-bit of the) result in register1 . |
b |
address |
|
|
jump to the instruction at address . |
beq |
register1 |
register2 |
address |
if the contents of register1 and register2 are equal, jump to the instruction at address . |
bne |
register1 |
register2 |
address |
if the contents of register1 and register2 are different, jump to the instruction at address . |
Each instruction is to be written on a separate line. Use the convention that the stack grows ‘downwards’ (i.e. the top of the stack is at a lower address than the rest of the stack) and that sp points at an unused word in memory just over the top of the stack. We also ignore issues such as overflow or alignment.
• Write a program that pops two integers from the stack, adds them, and pushes the result back onto the stack. [10 marks]
• Write a program that pops an integer n from the stack and computes the factorial of n, i.e. n! = n∗(n−1)∗ ... ∗2∗ 1 (assume that n > 0), and pushes it back onto the stack. [10 marks]
2023-01-05