闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

BA222 SPRING 2022 – HW04

Due on 5/2 (Tuesday) by 11:59 PM

Instructions

• Submit your output saved as a .PDF

• Name the file as BA222_HW03_lastNameFirstName

• Label each question appropriately

Problem 1. Forecasting and Non-Linear Models

For this problem we are going to use the ausbeer.csv data to predict the quarterly beer production in Australia. Only use the data up to 1969 Q4 (inclusive) to estimate the parameters of the model. Use the rest of the data to test the predictions of the model.

1. Make a graph the yearly average production and one of the average production by quarter. Interpret both graphs. What different patterns can you discern from each graph?

2. Make a line plot of the quarterly beer production in Australia. Are the results consistent with your answer in part 1?

3. Estimate a linear regression model for the beer production using a linear time trend (time as independent variable). Interpret the results for the intercept and slope coefficients. Discuss the statistical significance of the beta coefficients. Inspect the residuals of the regression, do you think they are randomly distributed or do you detect any pattern?

4. Estimate a linear regression model for the beer production using a dummy variable for each quarter and a linear time trend. Is there a seasonal component in the Australian beer data? How can you tell?

5. Estimate a non-linear regression model using polynomials for the beer production using a dummy variable for each quarter. You need to determine the correct degree of the polynomial. Is the non- linear fit better than the linear fit? How can you tell?

6. Use the model from part 5 to predict the beer production for each quarter of 1970, 1971 and 1972. Make a graph for your predictions, including a 95 percent confidence interval and the actual values for the beer production. Do you think your predictions are accurate?

Problem 2. Unsupervised Machine Learning: Clustering

For this problem use the spotify_songs.csv data to create playlists of similar songs using the K-means algorithm.

1. Inspect the variables in the dataset using the .head() command. Create a separate data frame including only the numeric variables in the data set (see the .select_dtypes() method). Also, use the .drop() method to delete the track_popularity variable as this will bias your results.

2. Use the preprocessing.scale() function to standardize each variable.

3. Run the K-means algorithm with 4 clusters

4. Use the estimation from the K-means algorithm to predict the cluster value for each observation. Use this information to compute the average value of each numeric variable per cluster. Describe what makes each cluster distinct from each other.

5. For each cluster, sort the observations by how close they are to their own centroid (see the euclidean_distances() function for help on calculating the distances). Using the nearest 100 songs to the centroid, take a random sample of five songs and call it a playlist. Listen to a playlist from each of the clusters and describe the playlist as best as you can. If you had to quickly describe

each playlist to a friend, how would you describe them?

6. (Optional) Write a function that given a song generates a list of five similar songs selected randomly.

2023-05-03

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths