MTH 303 Linear Statistical Models 2023-2024 S1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MTH 303
Linear Statistical Models
2023-2024 S1
Task 1(50 marks)
Researchers collected data from 50 different cities of China to study whether air pollution con- tributes to mortality.The dependent variable for analysis is age adjusted mortality(“Mortality”). The data includes variables measuring demographic characteristics of the cities,variables mea- suring climate characteristics,and variables recording the pollution potential of three different air pollutants.Please use R to build regression models and answer the following questions accordingly
#Variable |
Description |
|
I city |
City ID |
|
2 |
JanTemp |
Mean January temperature(F) |
3 |
July Temp |
Mean July temperature (F) |
4 |
RelHum |
Belative Humidity |
5 |
Rain |
Annual rainfall(mm |
6 |
PopDensity |
Population density |
7 |
Income |
Median income |
8 |
HCPot |
HC pollution potentia |
9 |
NOxPot |
Nitrous Oxide pollution potentia |
10 |
SO2Pot |
Sulfur Dioxide pollution potential |
11 |
Mortality |
Age adjusted mortality |
1.Started R.Install packages"readxl".Load libraries“Hmisc”,“leaps”and “MASS” . (2 marks)
2.Load data from“Mortality .csv”.Show part of the dataset. (6 marks)
3.Remove“City”by assigning null as it is an identifier column.Show part of the new dataset. (3 marks)
4.Use the new dataset from 3.Build the linear regression model with all variables consid- ered,name it as“model_mortalityO”and conduct a summary of it. (3 marks)
5.Use cook's distance to detect the outliers.Let the benchmark be 4 times the mean.A threshold line in red is required to be drawn to mark the benchmark and the outliers should be labeled in your plot.A detailed information of the outliers should be presented by using“head”function. (8 marks)
6.Remove the detected outliers and buid a new regression model,name it as “model_mortality1” Conduct a summary of it and make appropriate plots to check the normality and ho-
moscedasticity. (12 marks)
7.According to the summary of modelmortality1,what variables are significant at 1%level here? (4 marks)
8.Conduct model selection using stepwise method.Then using anova to conduct further selection starting from the final model given by stepwise selection.Conduct a summary on your final model. (6 marks)
9.Comment on the best model in terms of R squared,Adjusted R squared and significant variables based on the summary of it. (6 marks)
2023-11-09