闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COM3110: Text Processing (2022/2023)

Assignment: Sentiment Analysis of Movie Reviews

1 Project Description

The aim of this project is to implement a corpus-based Naive Bayes model for a sentiment analysis task using the Rotten Tomatoes movie review dataset. This dataset is derived from the "Sentiment Analysis on Movie Reviews" Kaggle competition1, that uses data from the works of [Pang and Lee, 2005] and [Socher et al., 2013]. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.

2 Submission

Submit your assignment work electronically via Blackboard. Precise instructions for what ﬁles to submit are given later in this document. Please check you have access to the relevant Blackboard unit and contact the module lecturer if not.

SUBMISSION DEADLINE: 15:00, Friday week 11 (9th December, 2022)

Penalties: standard departmental penalties apply for late hand-in and use of unfair means

3 Data Description

The dataset is a corpus of movie reviews originally collected by Pang and Lee. This dataset contains tab-separated ﬁles with phrases from the Rotten Tomatoes dataset. The data are split into train/dev/test sets and the sentences are shuﬄed from their original order.

• Each sentence has a SentenceId.

• They all have been tokenized already.

The training, dev and test set contain respectively 6529, 1000 and 1000 sentences. The sentences

are labelled on a scale of ﬁve values:

0. negative

1. somewhat negative

2. neutral

3. somewhat positive

4. positive

In the following table you can ﬁnd several sentences and their sentiment score. Please note that the test set is "blind", i.e. you are not given the gold standard sentiment scores. You will need to submit some ﬁles with the predicted labels for the test set and we will use these to as part of your assessment (see below).

SentenceId	Phrase	Sentiment
1292	The Sweetest Thing leaves a bitter taste .	0
343	It labours as storytelling	1
999	There ’s plenty to enjoy – in no small part thanks to Lau .	3
1227	Compellingly watchable .	4

4 Evaluation

Systems are evaluated according macro-F1 score, i.e. the mean of the class-wise F1-scores:

macro–F1 = F1–scorei

where N is the number of classes. F1–score is calculated for each class i:

2 * Precisioni * Recalli 2 * TPi

Precisioni + Recalli 2 * TPi + FPi + FNi

5 Project Roadmap

1. Implement preprocessing steps:

• You are free to add any preprocessing step (e.g. lowercasing) before training your models. Explain what you did in your report. Please note that this preprocessing steps are "universal", i.e. they should be applied for all models you train.

• Implement a function to map the 5-value sentiment scale to a 3-value sentiment scale. Namely, the labels "negative" (value 0) and "somewhat negative" (value 1) are merged into label "negative" (value 0). "Neutral" (value 2) will be mapped to "neutral" (value 1). And ﬁnally, "somewhat positive" (value 3) and "positive" (value 4) will be mapped to the label "positive" (value 2).

2. Implement a Naive Bayes classiﬁer from scratch.

• You may NOT re-use already implemented classes/functions (e.g. scikit-learn)

3. For each set of labels (5-value and 3-value scales), train two diﬀerent models (i.e. 4 models in total):

• One considering all the words in the training set as features (after your deﬁned pre- processing steps).

• One with a set of features of your choice determined by your experience (you will explain how you selected the features in your short report).

4. Implement the macro-F1 score metric from scratch.

5. Compute and display confusion matrices on the development set for each developed models. Compare the results using confusion matrices and macro-F1.

6. Process the test data with your best performing models (one for each class).

7. Write a report (see below for details).

6 What to Submit

Your assignment work is to be submitted electronically using MOLE, and should include:

1. Your Python code .

You should use Python 3.9.x (or above)2 and consider the provided NB_sentiment_analyser .py ﬁle as your main ﬁle. You can (and you are encouraged to) create other ﬁles to organise your code in classes (therefore, the ﬁnal submission is composed of all the ﬁles needed to run your code). However, the "interface" of your project should be through this provided ﬁle. To run and test your code, we will use the already pre-deﬁned parameters in a com- mand line:

python NB_sentiment_analyser .py <TRAINING_FILE> <DEV_FILE> <TEST_FILE> -classes <NUMBER_CLASSES> -features <all_words,features> -output_files -confusion_matrix

where:

• <TRAINING_FILE> <DEV_FILE> <TEST_FILE> are the paths to the training, dev and test ﬁles, respectively;

• -classes <NUMBER_CLASSES> should be either 3 or 5, i.e. the number of classes being predicted;

• -features is a parameter to deﬁne whether you are using your selected features or no features (i.e. all words);

• -output_files is an optional value deﬁning whether or not the prediction ﬁles should be saved (see below – default is "ﬁles are not saved"); and

• -confusion_matrix is an optional value deﬁning whether confusion matrices should be shown (default is "confusion matrices are not shown").

A standard output of your program is also already pre-deﬁned and available in the NB_sentiment_analyser .py ﬁle. It is a tab-separated output that will contain:

Student [tab] Number of classes [tab] Features [tab] macro-F1(dev)

For instance, for the following input:

python NB_sentiment_analyser .py train .tsv dev .tsv test .tsv -classes 3 -features all_words

where we want the results for 3 classes and using all words as features, the expected pro- gram output is:

acpXXjd [tab] 3 [tab] False [tab] 0 .200

2. A README ﬁle containing all the details about your implementation that are needed to run your code. You are expected to use Python 3.9.x (or above), however, you have any compelling reason to use a diﬀerent version you should clearly explain in this README. In this ﬁle you should also include all the libraries that you used. Standard libraries like numpy and pandas are not required much details (unless you rely on a speciﬁc version). However, if you use any other non-standard library (e.g. when extracting features for the Naive Bayes model), you need to detail their installation here.

3. Four ﬁles with the predictions on the development and test corpora consider- ing your best model (either with or without your features) in each setting, i.e. either 3 or 5 classes.

The format is tab separated as follows : SentenceId[tab]Sentiment

An example ﬁle named "SampleSubmission_test_predictions_5classes_acpXXjd.tsv" is provided with the data.

Those ﬁles MUST BE NAMED respectively:

• dev_predictions_3classes_<USER_ID>.tsv

• test_predictions_3classes_<USER_ID>.tsv

• dev_predictions_5classes_<USER_ID>.tsv

• test_predictions_5classes_<USER_ID>.tsv

where USER_ID is the student ID that you use to login into MUSE (i.e. the IDs starting with "acp", "mm", etc) .

We will use these ﬁles to calculate the performance of your best system on the development set and test set.

4. A short report (as a pdf ﬁle) .

It should NOT EXCEED 2 PAGES IN LENGTH . The report should include a brief

description of the extent of the implementation achieved, and should present the perfor- mance results you have collected under diﬀerent conﬁgurations, and any conclusions you draw from your analysis of these results. Graphs/tables may be used in presenting your results, to aid exposition.

7 Assessment Criteria

A total of 25 marks are available for the assignment and will be assigned based on the following general criteria (a more detailed marking codebook will be released later).

Implementation and Code Style – including README ﬁle (15 marks)

Have appropriate Python constructs been used? Is the code comprehensible and clearly com- mented? Does your code run and follow the pre-deﬁned instructions? Is the Naive Bayes imple- mentation correct? Were all the functionalities adequately implemented?

Report (10 marks)

Is the report a clear and accurate description of the implementation? How complete and accurate is the discussion of the performance of the diﬀerent systems under a range of conﬁgurations? How do you choose which is the best model? Did your models show improvements over a majority class baseline?

8 Notes and Comments

• Consider using the Pandas library to load the data . .https://pandaspydataorg/.

• Consider using Seaborn heatmap to render the confusion matrices .https://seaborn .pydataorg/.

• You may search internet for lists of English punctuation and/or stopwords (also called function words) that you may use in your assignment.

• sklearn functions (such as CountVectorized) *SHOULD NOT BE* used. All Naive Bayes calculations (including the count of words) should be made from scratch (you can use

numpy).

• For your information, the majority class macro-F1 results in the dev set for the diﬀerent class settings are:

– 3-class: 0.200

– 5-class: 0.089

References

[Pang and Lee, 2005] Pang, B. and Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd An- nual Meeting of the Association for Computational Linguistics (ACL’05), pages 115– 124, Ann Arbor, Michigan. Association for Computational Linguistics.

[Socher et al., 2013] Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631– 1642, Seattle, Washington, USA. Association for Computational Lin- guistics.