DNSC-4211 Programming forAnalytics Final Examination (Fall 2019)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DNSC-4211 Programming for Analytics
Final Examination (Fall 2019)
QUESTION 1
1. Creating functions, loops, for-loop and while-loops, nested loops. [20 points]
In a single Jupyter notebookfile, name thefile as ‘answer1.ipynb’
Task1: Write a program to accept 3 integers separated by a comma (e.g. 1,3,5) from a user and then find average of those numbers without using any built-in.
Task2: Write a program to accept one string as an input from user and print the same string in reverse order. (Note: Don’t use built-in function. If input is “apple”, output should be “elppa”)
Task3: Create three text files and name them file1.txt, file2.txt and file3.txt. Populate each file with 10 random numbers separated by commas. Keep these files on the same folder as your python script (e.g. file1.txt could contain 4, 7, 10, 30, 2, 1, 20, 15, 8, 3). Write a program which will perform following tasks:
a) Accept name of text file from user.
b) Read the numbers from the first text file into a list.
c) Sort the list and print the original as well as the sorted list.
d) Experiment with other two files as well.
QUESTION 2
2. Data Visualization using Python (matplotlib) [20 points]
In a single Jupyter notebookfile, name thefile as ‘answer1.ipynb’
Task1: Read the Salaries data set (Salaries.csv) and create some vectors of variables, which are rank,discipline, phd, service, sex, and salary.
• Part A: Create a Bar plot based on service and summaries the salaries per service category.What information we can extract from the plot???
• Part B: Create Box plot comparing salary and phd, comment on the output plot based on medianand quantiles.
• Part C: Create Pie chart comparing salary package of ten professionals
• Part D: Create Scatter plot that shows the relationship between two factors of an experiment(You can assume any two factors from dataset, comment on your output and selection)
Task2: Create a pie chart for Persons Weekly Spent Time per activities based on given
following toy dataset
• days = [1,2,3,4,5]
• sleeping = [7,8,6,11,7]
• eating = [2,3,4,3,2]
• working = [7,8,7,2,2]
• playing = [8,5,7,8,13]
• slices = [39,14,26,41]
• activities = ['sleeping', 'eating', 'working', 'playing']
• cols = ['c','m','r', 'b','g']
QUESTION 3
3. Building Prediction Models [20 points]
In a single Jupyter notebookfile, name thefile as ‘answer1.ipynb’
Task1: Read the ‘Marketing_MSA.xlsx’ file. The Master’s in Business Administration (MBA), once a flagship program in School of Business in United States. While elite schools like GWSB, Georgetown, and Chicago are still attracting applicants, other schools are finding it much harder to entice students. As a result, business schools are focusing on specialized master’s programs to give graduates the extra skills necessary to be career ready and successful in more technically challenging fields. An educational researcher is trying to analyze the determinants of the applicant pool for the specialized Master of Science in Accounting (MSA) program at medium-sized universities in the United States. Two important determinants are the marketing expense of the business school and the percentage of the MSA alumni who were employed within three months after graduation. Consider the data collected on the number of applications received (Applicants), marketing expense (Marketing, in $1,000s), and the percentage employed within three months (Employed).
• Part A: Estimate and interpret the effect of Marketing and Employed on the number of applications received. For a given marketing expense of $80,000, predict the number of applications received if 50% of the graduates were employed within three months. Repeat the analysis with 80% employed within three months
QUESTION 4
4. Machine Learning using Python [20 points]
In a single Jupyter notebookfile, name thefile as ‘answer1.ipynb’
Task1: Perform KMean cluster analysis on ‘Country clusters.csv’
• Part A: Cluster the countries based on “Latitude:”, “Longitude” and “Language:”
• Part B: Add scatter and dendrogram plot
• Part C: Use both elbow and Hierarchical clustering method
Task2: A national phone carrier conducted a socio-demographic study of their current mobile phone subscribers. Subscribers were asked to fill out survey questions about their current annual salaries (Salary), whether or not they live in a city (City equals 1 if living in a city, 0 otherwise), and socio-demographic information such as marital status (Married equals 1 if married, 0 otherwise), sex (Sex equals 1 if male, 0 otherwise), and whether or not they have completed a college degree (College equals 1 if college degree, 0 otherwise). Read the survey data file “Mobile Phone Sub. xlsx” collected from 196 subscribers.
• Part A: Perform agglomerative hierarchical clustering and interpret the results.
QUESTION 5
4. Data wrangling using python / Regular expression [20 points]
In a single Jupyter notebookfile, name thefile as ‘answer1.ipynb’
Task1: This set of questions is based on summer Olympic games. These games are held every four years and the files that you will be using to answer questions are: ‘summer.csv’, ‘countrydata.csv’, and ‘G20.csv’ . The datasets variables are described below:
This dataset is based on summer Olympic games. These games are held every four years and the file that you will be using to answer questions is: ‘summer.csv’ . The data in this dataset is described below:
• Year: Year of the Olympics
• City: City which hosted the Olympics
• Sport: The sport as in aquatics, athletics etc.
• Discipline: The discipline in the sport, e.g. Freestyle in the sport of swimming
• Athlete: Name of athlete who won a medal
• Country: Name of country represented by the athlete
• Gender: Gender of the athlete (Men or Women)
• Event: Name of event, e.g. 50M Freestyle in swimming
• Medal: Name of medal as in Gold, Silver or Bronze
The next set of data is based on ‘countrydata.csv’
• Country: Name of country
• Code: Three letter codefor the country
• Population: Population of the country
• GDPpc: GDP per Capita
The next set of data is based on ‘G20.csv’
• Member: Name of member country
• HDI: Human Development Index: The Human Development Index (HDI) is a statistic composite index of life expectancy, education, and per capita income indicators, which are used to rank countries into four tiers of human development. A country scores a higher HDI when the lifespan is higher, the education level is higher, and the GDP per capita is higher.
• IMFClassification: Classification of the country provided by the International Monetary Fund
Answer the following question based on the above datasets
• Part A: Plot the total number of gold medals won by countries placed in the top five based on the human development index.
2022-12-07