Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Stat 311 Homework 8

This assignment uses the file PenguinsHW8.csv. This is a popular data set; the data were collected and

made available byDr. Kristen Gormanand thePalmer Station, Antarctica LTER, a member of the Long      Term Ecological Research Network. (Gorman, Williams, and Fraser 2014). This version of the data set was downloaded fromhere(on GitHub Gist).

The goal is to predict penguin bill length from penguin bill depth. Bill length and bill depth are shown in the image below (Image copied fromhere.)

Regression Model

1.   In the HW8 template, we provide code that creates a scatterplot of bill length on bill depth. Describe the joint relationship between bill length and bill depth.

2.   In the HW8 template, we provide code that creates a second scatterplot that adds color coded plotting symbols by species. Describe the joint relationship(s) now taking species into account.

3.   Run a linear regression of bill length on bill depth. Show the regression summary (no interpretations needed). [Hint:  refer to HW3 if you do not remember how to set up the code]

4.   Write out the regression equation using information from the regression summary. Round the regression coefficients to two decimal places.

5.   Interpret the estimated slope of the regression equation in the context of the problem.

6.   Report and interpret r2 .

7.   In the HW8 template, we provide ggplot code to create model diagnostic plots (residual plot, histogram of residuals, normal QQ plot of residuals). Do you think that the assumptions for inference are met?

Explain addressing specific assumptions.

Inference for Regression (assume all assumptions for inference are met)

8.   Find a 95% confidence interval for the slope parameter using “by-hand” calculations in a code chunk. Report and interpret the interval. [Hint: pull the numbers you need from the regression summary]

9.   Perform a hypothesis test to determine if the slope parameter is different than zero. State the hypotheses   using symbols, report the test statistic and degrees of freedom, and the p-value. Include your decision and an interpretation in the context of the problem. Use a 5% significance level. [Hint:  no code needed]

10. Observation 194 is a Gentoo penguin with a bill length of 49.6 mm and a bill depth of 16.0 mm. Using the model with rounded coefficients from problem 4, what is the residual for this penguin? Does the

model over or underestimate the bill length for this penguin?

11. Find the 99% confidence interval for the mean bill length when the bill depth is 16.0 mm. Interpret this interval.

12. Find the 99% prediction interval for a penguin that has a bill depth of 16.0 mm. Interpret this interval.

Consider Penguin Species

13. In the HW8 template, we include code to run a model that includes bill depth and species. We start by

allowing the slopes to vary by species (using an asterisk between bill depth and species). Since there are three species, the first species will be included in the intercept term (Adeline) and the other two species  will have their own coefficients. You should see that both additional interaction terms are statistically significant (interaction terms are the ones that show up with : in the name, e.g.,

BillDepMM:SpeciesChinstrap. What does this mean?

14. Write out the equation for the model for Gentoo penguins, rounding all coefficients to two decimal places.

15. Calculate the residual for observation 194 (see Problem 10) based on the new model written out in

Problem 14. Based on the residual, does this new model do a better job of predicting the bill length for the Gentoo penguin in observation 194?

16. Create model diagnostic plots (residual plot, histogram of residuals, normal QQ plot of residuals) for the new model from Problem 13. Do you think that the assumptions for inference are met? Explain addressing specific assumptions. [Hint:  copy and paste the code from Problem 7 and then edit the code, changing the linear model object name to the lm3.out that was created in Problem 13].