Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


MATH38141 Regression Analysis - Coursework


This coursework accounts for 30% of your overall mark for this course and it may take around 10 hours to complete. Please present your solution in the form of a report which you should upload on Blackboard as a single file before the deadline. You can use R to do your calculations, but you must show the formulae (not as R code) that you have used to calculate in the text. Marks will be awarded for correct and accurate calculations and their interpretation. Interpretations should be made using the context instead of the generic symbols only. The marks will be more difficult to obtain if the presentation of the results is not effective or if any formulas used in the calculations are missing from the text.


Air Quality in New York City

This is an exploratory analysis of an air quality data set. It contains daily air quality measure-ments in New York City for 30 consecutive days. Each day contains four variables: ozone (Y) – the surface concentration of ozone in New York city; radiation (X1) – the solar radiation; temperature (X2) – the observed temperature, in degrees Fahrenheit; wind (X3) – wind speed, in miles per hour. Analyse the data as required to answer the following questions. State all assumptions that you have made.

(A) Draw the scatterplot matrix of ozone against each of the other three variables using the pairs function in R. Describe any observable trends in your plots.

(2 marks)

(B) Taking the ozone as the response and the other three variables as regressors, fit a multiple linear regression model that has a constant term.

(i) State the model to be fitted.

(ii) Use R to state the following matrices: (XTX)−1XTY and YTY.

(iii) Calculate the least squares estimates of the three regression coefficients. How are their signs related to your visual observations in (A)?

(iv) Compile an ANOVA table for the multiple linear regression with three items: regres-sion, residual and total corrected sums of squares.

(v) What is the p-value for the regression? What is your conclusion about the significance of X1, X2 and X3?

(vi) What is the coefficient of determination for this model? Interpret its value.

(vii) Suppose we wish to predict the difference in ozone concentration between two typical days on which (radiation, temperature, wind) are observed to be (100, 70, 10) and (50, 80, 10). Calculate the 95% prediction interval and interpret the result.

(viii) Draw the residuals plot and verify the related assumptions.

(14 marks)

(C) Suppose we want to see if the temperature variable can be removed from the model in (B):

(i) State the hypothesis to test.

(ii) Obtain and interpret the 95% confidence interval for the parameter of temperature.

(iii) Conclude whether inclusion of temperature will improve the fitting.

(4 marks)

(D) Suppose we want to assess the significance of the interaction effects between the three ex-planatory variables (radiation, temperature, wind speed). Consider the interaction model using X1 to X3 and all three interaction terms as the regressors, i.e. X1X2, X1X3 and X2X3.

(i) Calculate the least squares estimates.

(ii) What is the coefficient of determination for this model? Interpret its value.

(iii) Obtain and interpret the 95% confidence intervals for the model parameters.

(iv) How does this model compare to the model that you obtained in (B)? Use only the results that you have obtained so far to answer this question.

(v) Describe and use a hypothesis test to compare the models obtained in (B) and (D). Report and interpret your result.

(10 marks)