Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


DS2000

Spring 2022

Homework 6

Problem 1 - Class Examples

● Filename: sentiment.py

● Data file: reddit.txt

 

Has the Northeastern community been calm or worried about recent covid-related announcements from the administration? For this homework problem, we’ve compiled a bunch of recent comments on the NEU subreddit about covid, and we’ll perform a sentiment analysis over all of them.

 

The file is structured like this, over and over again:

NEU

Username

Points

Timestamp

Comment

Blank line

 

We’re interested in the sentiment values of comment lines. You can ignore everything else in the file.

 

Make a list of calm/positive words, and a list of worried/negative words -- you can reuse material from class, or come up with your own. Compute the sentiment score of each comment by adding +1 for every calm word and subtracting -1 for every worried word. Divide by the number of words in the comment to get a sentiment score between -1 and 1.

 

Finally, compute and print out the average sentiment score of all comments. This helps us answer the question, overall, has our community been more calm or more worried in the last few months?

 

Do a couple of things along the way:

● Clean up the comments -- all lowercase and stripped of punctuation/numbers. This makes it easier to find positive/negative words from your lists.

● Create a scatterplot -- we want to see a trend over time; have we gotten more or less chill? The file is not in the order we want, though, because it goes new-to-old, and we want old-to-new.

 

Assuming your sentiment scores are in a list, use python’s reverse() function to flip the order so it goes old-to-new.  For each comment, plot its sentiment score as the y-value and its position in the list as the x-value. Use different colors for positive (> 0), negative (< 0), and neutral sentiment scores. For full credit, your plot must include appropriate titles, labels, and a legend.

 

Remaining Problems (one file total):

● Filename: wheel.py

● Data file: puzz.txt

 

Download the txt file linked above. It contains a whole bunch of puzzles from America’s favorite game show... Wheel! Of! Fortune! Each line of the file has one puzzle (a word or phrase, possibly including punctuation). The file contains a mix of upper- and lower-case, so we suggest putting everything in upper-case for consistency.

 

For this homework, you’ll re-create the end-of-game “Bonus Puzzle”, where one contestant competes.

1. The puzzle is displayed, with 5 common English letters revealed: R, S, T, L, N and E. The remaining letters are displayed as blanks.

2. The contestant guesses 3 more consonants and 1 more vowel. These letters are then revealed.

3. The contestant attempts to guess the puzzle.

 

Problem 2 - Letter Frequencies

 

Prof. Rachlin is going on Wheel of Fortune and knows he won’t do very well, so you agree to help him cheat. You’ll help him pick the 3 extra consonants and one extra vowel. You figure, the more common a letter is among all puzzles, the more likely it is to show up.

 

Therefore, to help John out you need to:

● Compute the most common letters -- in particular, the 3 most common consonants and 1 most common vowel. These should NOT include R, S, T, L, N, or E, since they’re already covered.

● Print out your answers so we know the letters you’ll be revealing.

 

Problem 3 -  Let’s Play Our Game!

 

Prof. Rachlin is on the game show, and it’s time for the bonus puzzle! Your program should now...

● Pick a puzzle from the file at random.

● Display R, S, T, L, N, and E where needed but blanks otherwise.

 

Then, help him cheat: Based on part 2, you know the 3 most common consonants and 1 most common vowel over all puzzles. Reveal those letters as well.

 

Finally, let Prof. Rachlin guess the puzzle based on what’s displayed now. He gets only one chance to guess; tell him if he’s right or wrong.

 

Here’s an example of our program running. Note that the punctuation from the original puzzle is displayed, and it was fine for the player to type in lowercase even though the puzzle is all in uppercase. John lost, but I think the actual puzzle was weird (what even is “nanki-poo”?)