关键词 > CE314/887
CE 314/887 Natural Language Engineering Assignment 1
发布时间:2022-11-16
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CE 314/887 Natural Language Engineering
Assignment 1
Regular expression (40%) (You can store your code in output part1_regex_studentID.py)
1: Write a regular expression that can find all amounts of money ina text. Your expression should be able to deal with different formats and currencies, for example £50,000 and £117.3m as well as 30p, 500m euro, 338bn euros, $15bn and $92.88. Make sure that you can at least detect amounts in Pounds, Dollars and Euros. (You should write a python program to check thematching results, 20pts)
For full marks: include the output of a Python program that applies your regular expression to the
https://www.bbc.co.uk/news/business-41779341
2: Write a regular expression that can matching all phonenumbers listedbelow: (You should write a python program to check thematching results, 20pts)
555.123.4565
+1-(800)-545-2468
2-(800)-545-2468
3-800-545-2468
555-123-3456
555 222 3342
(234) 234 2442
(243)-234-2342
1234567890
123.456.7890
123.4567
1234567900
12345678900
NLTK (10%)
1: Find the 50 highest frequency word in Wall Street Journal corpus in NLTK.books (text7), submit your code as the name: part2_NLTK_studentID.py (All punctuation removed and all words lowercased.)
Language modelling:
You should write a python program for that and named as part3_LM_studentID.py
1: Build an n gram language model based on nltk’s Reuters corpus (from nltk.corpus import reuters), providethe code. (You can build a language model in a few lines of code using the NLTK package, you can use bigram, trigram or higher order grams) (20pts)
2: After step 1, make simple predictions with the language model you have built in question 1. We will start with two simple words – “he is” . Let your n gram model to tell me what will be the next word, show me both code and module generated results. (15 pts)
3: Based on the work of question 1 and question 2, generate a few sentences start with “he is” . (15 pts)
Hints:
For building n grams, you can refer to this link:
https://medium.com/swlh/language-modelling-with-nltk- 20eac7e70853#:~:text=An%20n%2Dgram%20model%20is,'There %20was%20huge%20rainfall'.
Writing code with comments is a good habit.