AS3110 Statistical Modelling 2022-23 Coursework 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
AS3110 Statistical Modelling
2022-23
Coursework 1
The spreadsheet Lottery .csv contains 50 observations, relative to dif- ferent geographical areas, of the following variables:
• SALES: online lottery sales to individual consumers (the response vari- able)
• PER PER HH: persons per household
• MED SCH YR: median years of schooling
• MED H VL: median home value (in $1000) for owner-occupied homes
• PRC RENT: % of housing that is renter-occupied
• PRC 55 P: % of population that is 55 or older
• HH MED AGE: household median age
• MED INC: median household income (in $1000)
• POP: population (in 1000)
The aim is to analyse how the number of lottery sales relates to the other explanatory variables.
Prepare a short report where the questions below are answered.
Work in R. Add any plots and/or chunk of R code to your document (do not submit the R script). You can use any R function and package avail- able.
(a) Regress lottery sales against population, median years of schooling and
median home value using the entire data set. Discuss the results, includ- ing eg diagnostic plots, parameter interpretation, confidence intervals and any other consideration you think useful. Can the model be simpli- fied or improved using interaction terms? [15 marks]
(b) Provide some (meaningful) examples of predictions that can be made using the model ultimately chosen in (a). [7 marks]
(c) Use cross validation to estimate the test MSE for the model chosen in (a). [9 marks]
(d) Starting from the entire set of all 8 covariates, use a selection process of your choice to determine a set of covariates which is most efficient in order to explain the response variable. [9 marks]
2023-03-25