COMP SCI 4817, 7417 and 4417 Applied Natural Language Processing Semester 1 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Applied Natural Language Processing
COMP SCI 4817, 7417 and 4417
Semester 1 2021
Text preprocessing, Language Models and Other Basic Concepts in NLP
Question 1
(a) Please describe what is tri-gram language model (2 points) and
how to calculate the probability of a sentence (w1 , w2 , . . . , wn } based on the tri-gram language model (2 points). [4 marks]
(b) (1) Please describe why we need to use Byte-pair Encoding (BPE) (3 points).
(2) Given the vocabulary shown in the following Table, what are the tokens in the vocabulary if we execute BPE algorithm three steps? (4 points)
Words |
Occurrence Frequency |
mat bat cat make |
3 4 2 5 |
[7 marks]
(c) What is the EDIT distance between the string “cat” and “ant”? Please show the steps of your calculation.
(d) True or False. Please write down your judgement and give expla- nation.
• Claim 1: Removing stop words does not always lead to im- proved performance of the NLP system.
• Claim 2: Language models with low Preplex does not neces- siarly lead to better performance for downstream tasks.
[6 marks]
[Total for Question 1: 21 marks]
Naive Bayes, Logistic Regression and Machine Learning for NLP
Question 2
(a) Given the following short movie reviews, each labeled with a genre, either comedy or action:
document (tokenized) |
label |
fun, couple, love, love fast, furious, shoot couple, fly, fast, fun, fun furious, shoot, shoot, fun fly, fast, shoot, love |
comedy action comedy action action |
and a new document D:fast, couple, shoot, fly
compute the most likely class for D. Assume a naive Bayes classifier and use add-1 smoothing for the likelihoods [6 marks]
(b) Based on the document representation obtained from the bag-of-
words model, both logistic regression and naive Bayes algorithms can be used for building text classifier. Please describe the advan- tage of using logistic regression over naive Bayes. [4 marks]
(c) Please describe the advantage of using Character-level CNN for text classification. [3 marks]
(d) What is beam search and why should we use beam search in the decoding step of the sequence to sequence model
[4 marks]
[Total for Question 2: 17 marks]
Meaning Representation in NLP
Question 3
(a) Please judge if the following statement is True or False. You need
to also give the reason for your decision.
True or False: Skip-gram is learned with the task of predicting a word from its surrounding context, in other words, the words before and after the word. [4 marks]
(b) Please name two contextualized word embedding approaches and describe what is the advantage of contextualized word embedding in comparison with traditional word embedding such as skip-gram. [4 marks]
(c) Please describe how skip-thought vector is trained. Your descrip- tion needs to include the training objective function of the model. [5 marks]
(d) Give First Order Logic translations for the following sentences:
• 1. Vegetarians do not eat meat.
• 2. Not all vegetarians eat eggs [6 marks]
(e) Please describe the differences between the semantic representa-
tion method in Propbank and Framenet. [4 marks]
[Total for Question 3: 23 marks]
Syntactic Parsing
VP → VP PP VP → VB PP PP → IN DT NN PP → IN NP NP → NP PP NP → DT NN NP → PRP |
0.7 0.3 0.6 0.4 0.2 0.7 0.1 |
PRP → he VB → drove IN → down IN → in DT → the NN → street NN → car |
1.0 0.3 0.1 0.3 0.9 0.9 0.9 |
(1) What one is the preferred parsing result given the PCFG. Please show your calculation. (6 points)
Please go on to the next page. . .
Semester 1 2021 Page 6 of 8
(2) The parsing tree can also be presented as a set of labelled spans. For each tree, please write down the spans with span length larger than 1 and their associated labels. (4 points)
[10 marks]
(b) Given the dependency parsing results shown in the following fig-
ure, write down the state of the stack, buffer and action at each steps.
[7 marks]
Sequence Tagging
Question 5
(a) Which method/methods in the following is a generative model for
sequence tagging (1) Hidden Markov Model (2) Maximum Entropy Markov Model (3) Conditional Random Field. (4) Bidirectional LSTM (2 points). What are the advantages of using recurrent neu- ral network based model over HMM for sequence tagging. Please list at least two advantages (4 points). [6 marks]
(b) Slot Filling is the task of interpreting user commands/queries by
extracting the attribute value of relevant aspects/slots. For exam- ple,
Query: What flights are available from pittsburgh to baltimore on thursday morning 9:00am.
Slots:
• - from city: pittsburgh
• - to city: baltimore
• - depart date: thursday
• - depart time: morning 9:00am
Please describe how could we convert this problem into a sequence tagging problem. If we are interested in extracting four slots as in the above example, how many unique tags do we need?
[4 marks]
[Total for Question 5: 10 marks]
Question 6
(a) What is task oriented dialogue system? What are its differences to
chitchat dialogue system? [3 marks]
(b) Relation extraction aims to automatically extract the relationship between entities from natural language. A relation is usually rep- resented as a triplet (Argument 1, relation type, Argument 2). A relation extraction system can identify if a sentence provides infor- mation about the relation-of-interest and can automatically extract
the values of the arguments. For example,
Suppose we want to build a relation extraction system for automat- ically collecting information about celebrities. We are interested in the following three relations (People, born in, Place), (People, mar- ries/is married to, People) and (People, won, prize). Please design a system to achieve automatic extraction of the above relations. You need to describe your design by in particular addressing the following issues
(1) The models and technologies used in the system. How to use them in your design. (4 points)
(2) If you use manually-designed features, please describe what features are you going to use. If you use machine learning models, please describe how to obtain the training data if you only have limited budget to hire someone to perform annotation. (5 points)
[9 marks]
[Total for Question 6: 12 marks]
2022-06-23