Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP2022/2922

Tutorial on Context-free Grammars

S2 2021

In the lecture we saw a new model of computation called ’context-free grammar’ (CFG). A CFG generates strings by rewriting variables (starting with the ’start’ variable) until there are none left.   The language of a CFG is called a ’context-free language’ .   These include all the regular languages, and more.  CFGs are central to describing the syntax of many programming languages. We ended the lecture with the fact that some CFGs can generate a string with more than one leftmost derivation (= more than one rightmost derivation, = more than one parse tree). Such strings are called ambiguous, and correspond, intuitively, to different ”meanings” of the string.

After this tutorial you should be able to:

Convert an English or mathematical description of a context-free language into a CFG, and argue why your grammar is correct.

Describe in English or mathematics the language of a CFG, and argue why your description is correct.

Argue if a CFG is ambiguous or not.

Problem 1. For each of the following grammars:

Indicate the set of variables, terminals, production rules, and the start variable.

Give two strings derivable by this grammar, and two strings that are not.

Describe in one sentence the language generated by the grammar.

Is the language regular?

1.

S → 0S0 | 1

2.

S → X1X

X → 0X | ϵ

Problem 2. Consider the following grammar:

E → E + T | E − T |  T

T → F × T | T/T | F

F → (E) | V | C

V → a | b | c

C → 1 | 2 | 3

1. Indicate the set of variables, terminals, production rules, and the start vari- able.

2. Give a left-most derivation of the string a + b × c

3. Give a right-most derivation of the string a + b × c

4. Give a parse tree for a × b − 2 × c

5. Give a parse tree for a x (b — 2 x c)

Problem 3. In this problem we see that it is easy to design a CFG that misses some strings. Consider the CFG G\:

S 一 Sab | aSb | abS | Sba | bSa | baS | ϵ

1. Show that if x e L(G\) then x has the same number of as as bs.

2. Show that there is a string with the same number of as as bs that is not in L(G\).

Problem 4.

1. Write a CFG for the language of balanced parentheses. So, e.g., the follow- ing strings are in the language:  ()(()) and (()) and ϵ, while the following are not: )() and (())). Explain each rule in at most two sentences.

2. Show this language is not regular.

Problem 5. (Exam 2022) Fix Σ = {0, 1}. Provide a context-free grammar for the language

{u0v1w : len(v) = len(u) + len(w)}

For instance, taking u = 01, w = 11, v = 1100 we get that u0v1w = 0101100111 is in the language.

Give a short explanation/justification of your answer.

Problem 6. Some programming languages define the if statement in similar ways to the following grammar:

Conditional 一  if Condition then Statement

Conditional if Condition then Statement else Statement Statement Conditional  | S1  | S2

Condition C1  | C2

1. Show that this grammar is ambiguous.

2. Show that the string you provided can be interpreted in two different ways, i.e., resulting in programs with different behaviours. You may want to write the programs corresponding to each derivation (hint: you can read these off from the parse trees).

3. Write a CFG that captures if statements but is not ambiguous.

Additional problems

Problem 7. I asked GPT-4 to write recursive code for the language of well- formed parentheses.

It gave me this:

1    def   is_well_formed(s):

2              #  Base   case :  An   empty   string   is  well - formed

3              if  not   s:

4                       return   True

5              #  If   string   starts   with   ’(’  and   ends   with   ’)’

6              if   s[0]   ==   ’ ( ’  and   s[-1]   ==   ’ ) ’ :

7                       return   is_well_formed(s[1:-1])

8 return False

Is this correct?

Problem 8. Consider the following context-free grammar:

S → ST

S → a

T → BS

B → +

For each of the following strings, say whether it is generated by the grammar, and if so, give a derivation:

• +a + a • a + a

a + a + a

a + a + +

Problem 9. Describe the language generated by the following context-free gram- mar:

S → X1Y

X → ϵ | X0

Y → ϵ | 1Y | Y0

Briefly explain why your answer is correct.

Is the grammar ambiguous?

Problem 10. Consider the following context-free grammar:

S → 0S  | 0S1S | ϵ

For each of the following words, say whether or not it is generated by the gram- mar:  001, 0101, 0110.  The only justification that is needed is a derivation for those words that are generated by the grammar.

Show that the grammar is ambiguous.

Problem 11. (Exam 2022) Fix Σ = {0, 1, #}. Provide a context-free grammar for the language of strings of the form u#v where u, v ∈ {0, 1}∗ and |u | =  |v | and the reverse of u is not equal to v.

For instance, 01#11, 011#011 is in the language, while #, 0#0, 01#10 are not. Give a short explanation/justification of your answer.

Problem 12. A CFG is called right-regular if every rule is of the form A → a or A → aB where A, B are arbitrary nonterminals and a is an arbitrary terminal or ϵ . A CFG is called left-regular if every rule is of the form A → a or A → Ba where A, B are arbitrary nonterminals and a is an arbitrary terminal or ϵ. A CFG is regular if it is left-regular or right-regular.

1. Write a right-regular grammar generating the language a | bc. 2. Write a left-regular grammar generating the language a | bc.

3. Find a grammar that only has rules of the form A  → a  or A  → aB  or A → Ba, and that generates a language that is not the language of any RE (very hard, but we will see how to do this later).

4. Show that the regular grammars generated exactly the languages of regular expressions (very hard, but we will see how to do this later).

Problem 13. For each of the following languages, find a CFG that generates it:

1.  {bb, bbbb, bbbbbb, ...} = {(bb)n+1  | n ∈ N}

2.  {a, ba, bba, bbba, ...} = {bn a | n N}

3.  {ε, ab, abab, ...} = {(ab)n  | n N}

4.  {ac, abc, abbbc, ...} = {abn c | n N}

5.  {ambn  | n, m N}

6.  {ambn  | n, m N, m > 0}

7.  {ambn  | n, m N, m > 0, n > 0}

8. {an bn  : n > 0}

Problem 14. Give a context-free grammar which generates just the valid identi- fiers. A valid identifier begins with a letter or underscore, contains only letters, underscores, or digits, and is ended by a blank.

Problem 15. Give a CFG that generates the language of propositional formulas over the atoms p, q. Make sure to state what the variables and terminals are.

Problem 16. Provide a context-free grammar for the following language L: the set of well-bracketed strings that use two different types of brackets,i.e., () and {}.  For instance, ({})() is in the language, as is (()), as is (){}, but (} is not. To justify your answer you should briefly explain a) why your grammar only generates strings in L, and b) why your grammar generates all strings in L.

Problem 17. Consider the following grammars.

1.  S → a  | SbS

2. S aS  | Sa  | a

3.  S → S[S]S | ε

4.  S → 0S1 | 01

5.  S +SS | −SS  | a

6.  S → SS+ | SS∗ | a

7.  S → S(S)S | ϵ

8.  S → aSbS  | bSaS  | ϵ

9.  S a  | S + S  | SS | S | (S)

For each:

Describe in English the language generated, and justify your answer.

Say if the language is ambiguous, and justify your answer.

If ambiguous, try find an equivalent unambiguous grammar.

Problem 18. Construct unambiguous CFGs for each of the following languages. In each case show that your grammar is correct.

1. Arithmetic expressions in postfix notation.

2. Left-associative lists of identifiers separated by commas.

3. Right-associative lists of identifies separated by commas.

4. Arithmetic expressions of integers and identifiers with the four binary op- erations +, −, ∗, /.

5. Add unary plus and minus to the arithmetic operations of the previous language.

Problem 19. Find a syntactic condition on the structure of a grammar that guar- antees that the language generated by the grammar is infinite (i.e., contains in- finitely many strings).

Problem 20. Here is an application of grammars to Compilers.

Syntax-directed translation is done by attaching rules or programs fragments to productions in a grammar.

Consider the CFG generating the set of binary strings:

S   →   0

S   →   1

S   →   S0

S   →   S1

Attach to each rule the following pseudocode:

S →    0    S.val = 0

S →    1    S.val = 1

S →   S0   S.val = S.val × 2

S →   S1   S.val = S.val × 2 + 1

Write an algorithm that will take a parse-tree for a string as input, do a bottom- up traversal of the tree (i.e., visit a node only after visiting all its children), and when visiting an internal node of the tree, executes the corresponding command. Once the traversal is complete, what is the content of S.val?

Problem 21. Construct syntax-guided translations between notations of arith- metic expressions:

1. From infix notation to prefix notation.

2. From infix notation to prefix notation.

3. From postfix notation to infix notation.

Problem 22. In lectures we said that the following CFG G generates theset L of strings with the same number of as as bs.

S → ϵ | aSbS | bSaS

Why is this true?

Problem 23. Show that the set of strings over the alphabet {(, )} accepted by the function check is the set of strings generated by the grammar S → SS |  (S) | ε .

Figure 1: Pseudocode

1    def   check(myStr):

2            counter   =  0

3              for   i   in  myStr:

4                          if i == "(":

5                                 counter   += 1

6                       elif   i  ==   " ) " :

7                                 if   counter   >  0:

8                                           counter   -=   1

9                                 else :

10                                          return   " Reject "

11              if   counter   ==   0:

12                       return   " Accept "

13              else :

14                       return   " Reject "