Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

2022-2023 COURSEWORK

PART II (Second, Third and Final Year)

MANAGEMENT SCIENCE

MSCI 212 Statistical Methods for Business

WRITTEN COURSEWORK ASSIGNMENT

Question 1 [Worth 50% of the marks]

As in Workshop 2, use <Transform><Random Number Generators> to set your unique starting point for the SPSS random number generator. For this coursework question use the last four digits of your library card (i.e., your Student ID number) PLUS 1, i.e. if your library card ends‘4321’type in‘4322’, and if your library card ends‘4329’type in‘4330’. Record these four digits at the top of your answer. Your final mark will be penalised if you use a different starting point than the one based on your library card, unless you have permission to do so from the module convenor.

An online article is being prepared on big budget movies.  You have been asked to support the article by providing statistical analyses. You have been provided with data from a random sample of 150 movies from the top 500 movies as measured by production budget.

The SPSS data file‘MovieStats.sav’contains the following data for the top 500 movies as measured by production budget:

•  title – Title of the movie;

•  production cost Estimated cost of production in $;

• worldwide gross – Worldwide gross theatrical revenues in $;

•  opening weekend – US/Canada opening weekend revenues in $;

•  mpaa – Motion Picture Association film ratings, link here;

•  mpa num Re-coded film ratings (0 = NA (Not Available), 1 = G, 2 = PG, 3 = PG-13, 4 = R, 5 = Unrated);

•  genre Genre of movie;

•  genre num Re-coded genres (0 = NA, 1 = Action, 2 = Adventure, 3 = Comedy, 4 = Other (Drama/Thriller/Horror/Musical/Western);

•  theaters  Number of US/Canada opening weekend theaters showing the movie;

•  runtime – Duration of the movie (in minutes);

•  year year of release.

Note that, two versions of the categorical data, mpaa and genre, have been provided. The original categories and aggregated categories re-coded into numerical values.  They represent the same variables, but the re-coded version may be more useful in some charts that you may wish to use.

Draw your random sample of size 150 from this population and investigate your sample using SPSS. If your sample contains potential data anomalies you need to decide how to use (or not use) these data and report any steps you take regarding them.

a)  In no more than 8 pages describe the main features of the movies in your sample, as if reporting to the writers of the article. You should include the main features of individual vari- ables and of the relationships between them. You may include SPSS numerical and graphical output and/or you may quote values from your SPSS output. (The clarity and content of your report are both important). [Worth about 80% of the marks for Q1]

b)  Without looking at the full data set, suggest which of the patterns/features noted in your sample (of 150) are also likely to be true of the full population of 500 movies. [No more than

2 pages. Worth about 20% of the marks for Q1]

Data source: Kaggle, link here.

Question 2 [Worth 50% of the marks]

You should include key parts of your SPSS output in your answer. You must explain your answer clearly and you are limited to a maximum of 10 pages.

The famous CEO of Seoul Rental Bike has hired you as an external consultant to evaluate factors, e.g., weather indicators, such as temperature, humidity, wind, etc., that affect the demand for the total number of rented bikes.  Specifically, he has asked you to develop a regression model that shows the important factors of the rental demand and can predict the demand under his weather scenarios.  He has provided the data to analyse.  It contains 353 days of data from 2017 to 2018 (Source: http://data.seoul.go.kr/, see file SeoulRentalBike.sav). The data description is as follows.

•  TotRent – Total number of bike rentals eacch day

•  Temp – Average temperature (in degrees Celsius C)

•  Hum – Average humidity (%)

•  Wind Average wind speed (metres per second, m/s)

•  Visib Average visibility (in metres, m). Maximum visibility recorded is 2000m.

•  Dew – Average dew point temperature each day, this is an alternative measure of humidity (in degrees Celsius  C)

•  Solar Average solar radiation (in mega joules per metre squared, MJ/m2 )

•  Rain – Total rainfall (in millimetres, mm)

•  Snow Total snowfall (in centimetres, cm)

a)  Carry out a preliminary analysis of the data using scatterplots, correlations, or anything else you think appropriate to demonstrate the relationship between the total number of bike rentals each day and all explanatory variables, and any relationships between explanatory variables. Report your preliminary findings. [Worth about 20% of the marks for Q2]

b)  Use stepwise regression starting with an“all-in”model and identify“the best”model from the output. Justify your answer. Discuss the significant and insignificant variables in the model. [Worth about 20% of the marks for Q2]

c)  Redo (b) but use stepwise regression starting with a“no-variable”model. Identify“the best” model from the output. Justify your answer. Discuss the significant and insignificant variables  in the model. [Worth about 10% of the marks for Q2]

d)  Compare the“best”models from (b) and (c) and identify which one is your preferred model. Explain the causes for any difference between models in (b) and (c), if any.  You can refer to your preliminary analysis to explain the difference(s) between both“best”models. [Worth about 10% of the marks for Q2]

e)  Carry out a residuals analysis to check whether or not the usual regression assumptions seem to hold for your preferred model. Carefully justify your conclusions, noting any reserva- tions you have about your equation. [Worth about 20% of the marks for Q2]

f)  The CEO would like you to predict the number of rentals under several scenarios he sets up.  Use your preferred model to comment carefully on the scenario in light of your residual diagnostics. The scenarios are the following:

Scenario

Temp

Humid

Wind

Visib

Dew

Solar

Rain

Snow

Decent weather

12

60

1.5

1400

4

0.6

1

0

Hot weather

35

80

1

1200

20

1

0

0

Cold, windy, snowy weather

-20

20

3

1000

-25

0.3

0

30

[Worth about 20% of the marks for Q2]