关键词 > STAT4038/STAT6038

STAT4038/STAT6038 RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS REGRESSION MODELLING Assignment 2 for Semester 1, 2022

发布时间:2022-05-19

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS

REGRESSION MODELLING

(STAT4038/STAT6038)

Assignment 2 for Semester 1, 2022

Question 1                                                                                                             [100 Marks]

Moorhens are those blue-purple-red water birds often seen down near Lake Burley Griffin in Commonwealth Park. They are characterised by large, fleshy red shields that protrude from their heads.  Some scientists have collected various measurements on a group of 43 moorhens in Commonwealth Park in the file “moorhen.csv”, which is available on Wattle.   The scientists have sent the data to you for analysis.   This data contains following 6 variables: Shield, Weight, Stern, Hb and TandT, Adult.

The e-mail accompanying the data is a little light on the details, but there is a suggestion that moorhens form a fairly hierarchical society and that shield size is a relevant indicator of a bird’s status within their group, so the variable of most interest (the response variable) is the area of each bird’s shield (units not specified, but presumably in mm2). An alternative explanation might be that a bird’s status is more strongly related to their overall size (which could be measured by the bird’s weight, presumably in mg) and that bigger birds simply have larger shields.

In this assignment, we would like to use all available variables including Weight to try and build a multiple regression model with Shield area as the response variable. The e-mail from the scientists that came with the data doesn’t really describe the variables Stern, Hb and TandT, except to say that they are “three lineal measurements”taken on each bird. Adult is an indicator of whether the bird is a juvenile (0) or adult (1) bird.

Use R to further analyse the “moorhen” data and answer these questions:

(a)  [6 marks] Fit a multiple linear regression (MLR) model with Shield as the re- sponse variable and all other numeric variables (excluding Adult) as predictors. Present the main residual plot of the residuals against the fitted values for this model. Are there are any obvious problems with underlying assumptions?

(b)  [10 marks] Now fit a MLR model with ln(Shield) as the response variable, still using all the other numeric variables (not log transformed) as explanatory variables. Again present the main residual plot of the residuals against the fitted values for this new model. Does the transformation applied to the response variable appear to have corrected any problems you identified in part (a)? Then, test whether this model is significant.

(c)  [12 marks] What are the estimated coefficients of the MLR model in part (b) and the standard errors associated with these coefficients? Interpret the values of each of the estimated coefficients with regards to model specification.  Construct 95% Bonferroni joint confidence intervals for all the slope parameters. Comment on the t-test results in the summary output.

(d)  [12 marks] Produce both a scatterplot matrix and a correlation matrix for the predictors included in the model and comment on any important relationships between the variables.  Do you see a problem with this MLR model as in part

(b)?  Conduct a diagnostic check quantitatively to determine the severity of this particular problem. What could be done to solve this problem?

(e)  [12 marks] You have now discussed this problem with the scientists and they sug- gest to include Stern and Weight as potential predictors in the model. However, you doubt the importance of the variable Weight. You are not sure what kind of marginal relationship is between Weight and the response ln(Shield), given that Stern is already included in the model.  Generate an appropriate plot to visually check this relationship and comment on the plot. Then conduct a partial F-test to determine whether Weight is a significant addition to a model that already includes

Stern.

(f)  [8 marks] The scientists remind you that a juvenile bird and an adult bird tend to have a different shield size. Therefore, you want to know how does the variable Adult affect the response ln(Shield).  Conduct a test of whether an adult bird has larger shield than a juvenile bird by fitting a simple linear regression model. Then provide a 95% confidence interval on the slope coefficient and interpret this interval.

(g)  [6 marks] Finally, given above findings, you decide to fit a MLR model with ln(Shield) as the response variable and with Adult and Stern as predictor. Conduct a t-test     for Stern in this model, compare this t-test result with the one in part (c) for     Stern, and comment on the reason of difference if any.

(h)  [16 marks] Using the model in part (g), produce a plot of externally studentized

residuals against fitted values, a normal QQ plot, a leverage plot, a Cook’s distance plot and a number of DFBETAs plots for all the slope coefficients in your model.

Comment on the model assumptions and unusual points.  Do you see any feature in the residual plot (explain if you see any)?

(i)  [8 marks] Generate a scatter plot of Shield (in its original scale) against Stern, using different color for juvenile and adult birds.  Use the model from part (g) to predict the expected shield area for both juvenile and adult birds over the full range of possible Stern measurements and include these on your plot as two different curves (using different color or line types). Include appropriate titles, axis labels, a legend and a brief discussion of your plot.

(j)  [10 marks] With the model in part (g), consider adding the interaction term be- tween Stern and Adult. Generate a scatter plot of ln(Shield) (in log scale) against Stern, using different color for juvenile and adult birds.  Add fitted lines for ju- venile and adult birds in a different color (or a different line type).  Comment on the plot whether there is a visible interaction.  Then test whether the interaction is significant.