Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DNSC-4211 Programming for Analytics

Mid-Term (Sample)

Date: xx/xx/2019                                                                                   Time: 1 hours, 15 mins

In a single Rfile or a single R markdownfile, name thefile as GWID.R or GWID.RMD

 

1. Creating functions and plots [20 points]

Task_01: Write a function to counts the number of odd integers in vector

OR

Task_01: Use iris dataset and create a scatter plot between Sepal width ~ Sepal length

2. Loops: for-loop and while-loops, loop inside loop [20 points]

Task_01: Create a list that contains the following values: 10,24,100,56,49. Write a for-loop that only prints out the numbers which are larger than 50

OR

Task_01: Create a list that contains the following letters: d,a,t,a,s,c,i,e,n,c,e. Write a for-loop that only prints out the letters that are vowels (e.g., o,u,a,e,i)

3. Linear regression with interpretation [20 points]

Task_01: Read the file Health.xlsx. According to the World Health Organization, the health of an individual, to a large extent, is determined by factors such as income, education, genetics, and social connections. In a survey, 120 adults are asked how they rated their health (Health) and social connections (Social) on a scale of 1 to 100. Information is also gathered on their household income (Income, in $1,000s) and college education (College equals 1 if they have completed bachelor’s degree, 0 otherwise). The accompanying file contains relevant data.

•   Part a: Estimate and interpret a regression model for Health using Social, Income, and College as the predictor variables. Predict the health rating of a college-educated person given Social = 80 and Income = 100. What if the person is not college- educated?

•   b. Estimate and interpret an extended model that includes the interactions of Social with Income and Social with College. Predict the health rating of a college-educated person given Social = 80 and Income = 100. What if the person is not college-educated?

•   c. Explain which of the two models is preferred for making predictions.

4. Data Visualization using ggplot [20 points]

Task_01: Read  Pandemic.csv, Create a ggplot for COVID –  19 conditions (cases: new, death, recovered) for any one country of your choice, except Afghanistan

Anticipated Output:

 

Task_02: Read ‘wine.csv’, Plot box graph between 'sulphates' and 'quality' variables

5. Data Wrangling using dplyr [20 points]

Read the  Nobel Prize.csv dataset and answer the following question.

    Task: Display the top 25 universities/organizations with number of prizes won.

The data fields are as follows:

•   Year: The year of award

•   Category: The field in which the award is given

•   Prize: The full name of the prize. Seems to be drivable from other column values.

•   Motivation: The reason for awarding the prize.

•   Prize Share: How many people share the prize.

•   Laureate ID: An integer ID

•   Laureate Type: Whether it is an individual or an organization.

•   Full Name: Full name of the awardee

•   Birth Date: Birth date of the awardee

•   Birth City: Birth city of the awardee

•   Birth Country: Birth country of the awardee

•   Sex: Gender of the awardee

•   Organization Name: (Only applicable if it's an organization)

•   Organization City: (Only applicable if it's an organization)

•   Organization Country: (Only applicable if it's an organization)

•   Death Date: Date of death of the awardee

•   Death City: City of death of the awardee

•   Death Country: Country of death of the awardee