Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSCI 1133, Fall 2022

Homework 09

Purpose: The purpose of this homework is to use dictionaries to deal with large data sets and files.

Instructions: This assignment consists of 3 problems, worth a total of 30 points. You must create a Python file called hw09.py, and submit it to Gradescope.  If you have questions, ask!

Because your homework file is submitted and tested electronically, the following are very important:

   Submit the correct file, hw09.py, through Gradescope by the due date.

●   Your program should run without errors when we execute it using Python 3.

Submit your work to Gradescope under the Homework 09 assignment.  Make sure that your file is called hw09.py: ifyou don’t match the file name exactly then the testing script won’t work and you’ll lose        points.

The following will result in a score reduction equal to a percentage of the total possible points:

●   Incorrectly named/submitted source file or functions (20%)

●   Failure to execute due to syntax errors (30%)

●   Bad coding style (missing documentation, meaningless variable names, etc.) (up to 30%)

Some of the points in this assignment are given based on the Gradescope autograder, so doing something that breaks the gradescope tests (naming your file incorrectly, naming your function incorrectly, having   syntax errors, or using the input() function somewhere the autograder doesn’t expect) are likely to lose    you all of those points.

Documentation:

Use the following template for EVERY function that you write.  The docstring below should be placed inside of the function, just after the function signature.

'''

Purpose:

Parameter(s):

Return Value:

'''

Problem A. ( 10 points) Weighted Random Choice

As part of problem C, we’re going to randomly select words to create sentences. However, we    don't want it to be uniformly random. We will want to influence the random selection by having some words be more likely to appear than others. This is called weighted random selection. In    this problem, we will build towards that goal by creating a function that does a weighted random selection from the keys of a dictionary, using the values as the weights.

Create a function weighted_choice that takes as a parameter a dictionary of words and their corresponding counts. The function should randomly choose one of the words, using the counts as weights in the selection. For example, if I pass the dictionary

{'green': 1, 'eggs': 1}

To this function, it should return green” half the time and eggs” half the time. If I pass the dictionary

{'green': 1, 'eggs': 3, 'ham': 2}

It should return green” one-sixth of the time, “eggs” half (3/6) of the time, and “ham” one-third (2/6) of the time.

There are a number of ways to accomplish this task. Many solutions require you to iterate over   the given dictionary to add up the counts to get the total of the counts before using a function      from the random module to get a random number in the range from 1 to the total. Then, you can use that number to figure out which word it corresponds to and return the appropriate word.

There’s also a simple method that utilizes making a list with x copies of each key (where x is the value for said key), and then using random.choice.

Examples (random, so won’t necessarily match, but make sure that both come up about 50% of the time):

>>> weighted_choice({'green': 1, 'eggs': 1})

'green'

>>> weighted_choice({'green': 1, 'eggs': 1})

'eggs'

>>> weighted_choice({'green': 1, 'eggs': 1})

'eggs'

Note that testing this function is a little tricky because it relies on randomness. You will not see  the same output every time!  We can control for this a bit by running the function many times, to ensure that on average, the probabilities roughly match the outcome.  For example, if we run

weighted_choice({'green': 1, 'eggs': 3, 'ham': 2})

600 times, then we would expect to get green’ about 100 times, ‘eggs’ about 300 times, and ‘ham’ about 200 times.

>>> results = [weighted_choice({'green': 1, 'eggs': 3, 'ham':2}) for i

in range(600)]

>>> results.count('green')

99

>>> results.count('eggs')

296

>>> results.count('ham')

205

If your results are consistently off by more than 20 or so, you should probably check your algorithm.

Problem B. ( 10 points) Counting Votes

Minnesota is holding elections on the day that this assignment is due! We can help them count    results by writing a program. Assume that each district copiles votes into a CSV file, where each row contains one citizen’s ballot data, and each column represents a different office up for           election.  The first row is header data indicating what office each column represents.

 

For example, the CSV file above would represent a district with three offices up for election, and five citizens who voted.

●   The first voted for Jobu Tupaki for Mayor, Shallan Davar for Sheriff, and Buffy Summers for Governor.

●   The second voted for Evelyn Wang for Mayor, Liz Lemon for Sheriff, and Leslie Knope for Governor.

and so on

We’re going to be creating a dictionary representing the vote counts for a specific office.  For example, in the spreadsheet above, if I wanted a dictionary of vote counts for the office of     County Sheriff, the result would be:

{'Shallan Davar': 1, 'Liz Lemon': 3, 'Ron Swanson': 1}

Whereas the vote counts for Governor would be:

{'Buffy Summers': 2, 'Leslie Knope': 1, 'Gordon Freeman': 1,

'Monkey D. Luffy': 1}

Write a function that count_votes(district, office) that takes in two strings as parameters.

●   district should be the name of a file containing all voting data for a given district, in the format specified above. You can assume that said file actually exists (no need for a try-except block).

●   office should be the name of one of the column titles present in the CSV file. You can assume that the office passed in will match one of the columns in the CSV file.

The function should return a dictionary in which each key is a name present in the column   corresponding to the given office, and the value represents how many times that name occurs within the column.

Note that it is possible to have someone write in the same name for multiple different offices, so you do need to be careful to only count occurrences of the name present in the column                representing the office you’re counting votes for.

You are permitted to import the csv module, but this is not required.

Hints:

●   Be careful when the request is for the last column in the file - if you’re splitting by          comma, then that column will contain the '\n' character, and this shouldn’t be included in the candidate names, or in the name of the office.

●   You can assume for the sake of simplicity that no candidate or office will contain a comma in the name.

Disclaimer: The data in the CSV files was randomly generated from all names available on the ballot for a given district according to https://myballotmn.sos.state.mn.us/ (along with a few     fictional characters to simulate write-ins).  Don’t take any given candidate randomly getting     more votes” than another as an endorsement.

Examples (assumes that you have the sample files from hw09files.zip downloaded to the same directory that you’re running Python in):

>>> count_votes('district_0z.csv', 'County Sheriff')

{'Shallan Davar': 1, 'Liz Lemon': 3, 'Ron Swanson': 1}

>>> count_votes('district_4b.csv', 'Mayor')

{'Shelly Carlson': 7, 'Donna Meagle': 2, 'Kevin Nese Shores': 5}

>>> count_votes('district_60b.csv', 'County Commissioner District 4')

{'Angela Conley': 50, 'Monkey D. Luffy': 1, 'Jobu Tupaki': 2, 'Leslie

Knope': 1}

Problem C. ( 10 points) Random Sentence Generation

Getting computers to produce convincing English text about a particular topic is a very               challenging problem, so we won’t be doing that in this assignment.  Instead, we’ll use a shortcut that quickly produces nonsense that somewhat resembles coherent English.

There are two versions of this problem: one easy and one hard. You won’t get any additional points for doing it the hard way, but it does produce more convincing results and is better      practice if you have the time.

Write a function called random_sent(source_file, length) that takes in two parameters:

●   source_file is a string representing the name of a file in the current directory, which we’ll use as a basis for producing our text

●   length is an integer representing the number of words in our output text

The function should do the following:

   Open the target source file for reading (you can assume that it exists).

●   Read in the entire file as one gigantic string.

●   Use .split() with no arguments to split the whole file into a list of words” .  There will   still be some punctuation attached to some of the words, but you should actually leave it in - it often makes the output sentences more convincing.

●   Make a dictionary counting how many times each word” appears in the list. Note that this will count words that contain the same letters but different punctuation or               capitalization as different words”, and that’s fine.

From here, the function needs to return a string containing length words, separated by spaces.

The first word in the output text must be chosen by passing in the dictionary of word counts into the weighted_choice function from problem A: this means that the word will always be one  that appears in the given file, and the more often a given word appears in the file the more likely it will be the one to start our output text.

How the rest of the words in the sentence are generated is up to you.

●   To do this the easy way, just use the same process you used for the first word to generate all the remaining words using a weighted random choice.

●   But, if you want to get an output which is a bit closer to being coherent English, you        should instead try to follow the rule: If word A immediately follows word B in the output string, then word A must have immediately followed word B somewhere in the input text.

○   For example, if the input text was "It was the best of times, it was the worst of  times", then if the word "the" is chosen to go into the output, then the next word would be randomly chosen between "best" or "worst", since those are the only   two words that immediately follow "the" in the input text.

○   The exception to this is if you pick the last word in the file - if there’s nothing that follows that word then you have no choice but to pick the next one randomly from the whole file again.

○   Consider using another dictionary to help with this task - where the keys are each word in the input file, and the values are lists or dictionaries containing every      word that immediately follows that one at some point in the file.

Hint:

●   It might be useful to make one or more helper functions for this problem - for example a helper function that takes in the text and outputs a dictionary containing the counts for   each word that appears could be useful.

Constraints:

●   Every word in the output text must appear in the input text.

●   You must call the Problem A function ( weighted_choice) to determine the first word of the output text.

Examples (random, so won’t necessarily match, but make sure the number of words is correct and each one appears somewhere in the input text file):

>>> random_sent('short3.txt', 10)

'for second-guessing Too late to try defying gravity I guess'

>>> random_sent('hamlet.txt', 20)

"pleasure of fear and dungeons, Denmark to those effects for look like

a puff'd and we are the bloat king"

>>> random_sent('alice.txt', 30)

'into the mushroom, and seemed inclined to be otherwise than nine

oclock in the stupidest tea-party I suppose Dinahll be removed,’

said the field after them!’ And washing?’ said the'