Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Take Home Project #2 (25 Mark)

IS6751 Text & Web Mining

Due on 25 April 2023

1.    Build Sentiment Classifiers with the following requirements. 1.1. Dataset

-         Use reviews_with_splits_lite.csv in rnn-transformer-sentiment\data\rt-polarity: Sentence-Level dataset from Rotten Tomatoes

(http://www.rottentomatoes.com/): 5331 positive sentences and 5331 negative sentences

-         Do not change split values in the file since everyone should use the same test data.

1.2. Algorithm

-         Use following code as baseline models.

•   Sentiment-Classification-with-RNNs-v3.ipynb

•   Sentiment-Classification-with-bert-v1.ipynb (use a gpu machine, such as google colab)

•   Sentiment-Classification-with-other-transformer-models-v1.ipynb (use a gpu machine, such as google colab)

1.3. Tasks

-         Task 1: optimize the RNN model (Sentiment-Classification-with-RNNs- v3.ipynb)

-         Task 2: optimize either BERT model (Sentiment-Classification-with-bert-

v1.ipynb) or any other transformer model such as DistilRoBERTa and              RoBERTa (Sentiment-Classification-with-other-transformer-models-v1.ipynb) to develop the best performing model.

•   For Task 2, if you can provide better results than the provided two

transformer-based models’ code (i.e., Sentiment-Classification-with-bert-   v1.ipynb and Sentiment-Classification-with-other-transformer-models-       v1.ipynb), you can use any types of Neutral Networks to develop the best model in PyTorch with the provided dataset. For instance, you can use a   Multi-Layer Neural Network, CNN, BERT, any other transformer-based    model, etc (note: RNN models including LSTM and GRU and traditional   models, such as SVM and Naïve Bayes classifier, are not allowed).

•   For Task 2, even though I recommend you use a gpu machine, if you cannot use it with some reasons, such as usage limits in google colab, you may use the CNN model instead.

1.4. Improve the model using following approaches.

-         Text pre-processing, such as removing stop words, and using lemmatization, case folding, etc.

-         Change hyperparameters, such as learning rate, # of hidden units, mini-batch size, # of layers, dropout, batch norm, regularization, etc.

-         Any other techniques if you like to use.

-         Note that you cannot use an existing model without training with the provided dataset.

Submission:

Submit one zip file (use only zip compression file), named home-project-no2-

yourname.zip, that contains your report file, Jupyter Notebook files, data files (i.e., input data) and the best model files (e.g., model.pth files) through Turnitin on the class website.

-     Write a report in Word or PDF that discusses your observations, such as test results with various approaches. The report should contain up to 2,000 words and write down the   total number of words on your cover page.

-     The report file should have a cover page.

-     The Jupyter notebook files must show all output results of your Python code. So please make sure that you run all the cells in the notebook files before your submission.

-     If model files are too big for Turnitin submission, you do not need to include them in your zip file.

-     Note that Turnitin does not allow you to resubmit your assignment file.

-     Reports and required files submitted in after the due date will not be marked because of the strict university deadline.