DNSC-4211 Programming forAnalytics Mid-Term (Sample)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DNSC-4211 Programming for Analytics
Mid-Term (Sample)
Date: xx/xx/2019 Time: 1 hours, 15 mins
In a single Rfile or a single R markdownfile, name thefile as ‘GWID.R’ or ‘GWID.RMD’
1. Creating functions and plots [20 points]
Task_01: Write a function to counts the number of odd integers in vector
OR
Task_01: Use iris dataset and create a scatter plot between Sepal width ~ Sepal length
2. Loops: for-loop and while-loops, loop inside loop [20 points]
Task_01: Create a list that contains the following values: 10,24,100,56,49. Write a for-loop that only prints out the numbers which are larger than 50
OR
Task_01: Create a list that contains the following letters: d,a,t,a,s,c,i,e,n,c,e. Write a for-loop that only prints out the letters that are vowels (e.g., o,u,a,e,i)
3. Linear regression with interpretation [20 points]
Task_01: Read the file “Health.xlsx”. According to the World Health Organization, the health of an individual, to a large extent, is determined by factors such as income, education, genetics, and social connections. In a survey, 120 adults are asked how they rated their health (Health) and social connections (Social) on a scale of 1 to 100. Information is also gathered on their household income (Income, in $1,000s) and college education (College equals 1 if they have completed bachelor’s degree, 0 otherwise). The accompanying file contains relevant data.
• Part a: Estimate and interpret a regression model for Health using Social, Income, and College as the predictor variables. Predict the health rating of a college-educated person given Social = 80 and Income = 100. What if the person is not college- educated?
• b. Estimate and interpret an extended model that includes the interactions of Social with Income and Social with College. Predict the health rating of a college-educated person given Social = 80 and Income = 100. What if the person is not college-educated?
• c. Explain which of the two models is preferred for making predictions.
4. Data Visualization using ggplot [20 points]
Task_01: Read “Pandemic.csv”, Create a ggplot for COVID – 19 conditions (cases: new, death, recovered) for any one country of your choice, except Afghanistan
Anticipated Output:
Task_02: Read ‘wine.csv’, Plot box graph between 'sulphates' and 'quality' variables
5. Data Wrangling using dplyr [20 points]
Read the ‘Nobel Prize.csv’ dataset and answer the following question.
• Task: Display the top 25 universities/organizations with number of prizes won.
The data fields are as follows:
• Year: The year of award
• Category: The field in which the award is given
• Prize: The full name of the prize. Seems to be drivable from other column values.
• Motivation: The reason for awarding the prize.
• Prize Share: How many people share the prize.
• Laureate ID: An integer ID
• Laureate Type: Whether it is an individual or an organization.
• Full Name: Full name of the awardee
• Birth Date: Birth date of the awardee
• Birth City: Birth city of the awardee
• Birth Country: Birth country of the awardee
• Sex: Gender of the awardee
• Organization Name: (Only applicable if it's an organization)
• Organization City: (Only applicable if it's an organization)
• Organization Country: (Only applicable if it's an organization)
• Death Date: Date of death of the awardee
• Death City: City of death of the awardee
• Death Country: Country of death of the awardee
2022-10-18