Python HW3
CS5014 Spring 2017
Coding style .
Always start with a documentation string (“docstring”). In Spyder, replace the lines between the two lines of triple quotes with a brief description of the program, followed by two lines:
Author: Your Name
Python Version: X.XX
Changelog: Initial version YYYY-MM-DD

For the Python version, make sure that you specify if you used 2.7 or 3.5 to develop your code.

Please choose variable names that are descriptive. Please do not “hard code” literals. If I ask you to write a program that uses a particular value, you can assign the value to a variable at the beginning and use the variable throughout the code.


Program 1

An excerpt from “The Grimm Fairy Tales” is available on Collab under Resources -> Data -> GrimmFairyTales_excerpt.txt. For fun, we would like to know how often “good” words versus “evil” words are used. For this exercise, we will define “good” words to be the words “beautiful”, “good”, “kind”, and “care”; and “evil” words to be “wicked”, “evil”, “kill”, and “killed”.


Write a program that will read in the file GrimmFairyTales_excerpt.txt, determine the overall number of “good” words versus “evil” words. Write to a file named “good_vs_evil.out”, the count of “good” and “bad” words, along with a statement indicating whether the excerpt focused on good or evil.


Upload your Python file and the good_vs_evil.out file to Collab.


Program 2

The file “pima-indians-diabetes.data” (located on Collab under Resources -> Data) contains a comma separated data where each row includes the following values measured for an individual:

 Number of times pregnant
 Plasma glucose concentration a 2 hours in an oral glucose tolerance test
 Diastolic blood pressure (mm Hg)
 Triceps skin fold thickness (mm)
 2-Hour serum insulin (mu U/ml)
 Body mass index (weight in kg/(height in m)^2)
 Diabetes pedigree function
 Age (years)

 Class variable (0 or 1)


Some of the measurements, like 2-Hour serum insulin, Body mass index, and age, should not have measurements of zero. If they do, then that value is missing from the data.

We would like to know what is the average for the three measurements, 2-Hour serum insulin, Body mass index, and age, but we do not want to include any values that are missing values. Think about how you would compute such an average.

Write a program that will
 Read in the data and create three lists of numeric values: one for 2-Hour serum insulin data, one for Body mass index, and one for age. (Make sure each list contains all of the values for that measurement, including any zero values.)
 Have a function, called, compute_ave that will have one parameter: a list that will be passed into the function from the caller. Have the function loop through the values in the list and compute the average for only the non-zero values. Have the function return the computed average.
 In the main program, call the compute_ave function to compute the average of the 2- Hour serum insulin, the average of the Body mass index, and the average of the age.
 Write a statement to the console that provide the three averages. (Make sure the user knows which value is associated with which set of measurements.)

Copy the results of your output to the Collab text box. Upload your Python file to Collab.