Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Stat 311 Homework 3 – this homework uses the student performance data that was used in HW2

1.   Some basic univariate visualization of the reading and math scores.

In the HW3 template, we provide ggplot code to make histograms andboxplots of the Reading and Math variables (the Writing variable was explored in HW2). We use ggarrange from the ggpubr package (install the package first if you did not install it when playing with the demo code in Quantitative.Rmd) to put all four graphs in a single figure. Summarize the distributions of both variables based on what you observe in the plots.

2.   In the HW3 template, we provide code to explore the joint relationships among the three quantitative variables. The code provides pairwise joint distributions and correlations for the three quantitative variables. The code produces two different figures, one ignoring lunch status and one including lunch status. You will  need to install and attach the GGally and the ggcorrplot packages to run the code. Include the correlations as part of your summary.

a)   Summarize the joint relationship between math and writing scores based on the scatterplot in the in the first plot (first scatterplot in third row of black and white figure; correlation is in third cell of first row).

b)  Summarize the additional information about the joint relationship between math and writing scores, by   lunch type, that you observe in the second plot relative to the first plot. [Look at the same plot/cell in the pink/salmon figure]

3.   Regression of Writing on Math

a)  Fit a linear regression for writing on math   [do not use any log transformations] and show the regression summary. [Hint:  use the lm function to get the linear model and summary(lm object name) to get the model summary; make sure to name your saved object lm.out1]

b)  Write out the regression equation [round both parameter estimates to two decimal places]. No interpretations needed.

c)   In the HW3 template, we provide code to create a scatterplot of writing on math. You need to complete the geom_abline(intercept=, slope=) part of the code, filling in the values for the estimated y-intercept and slope, to create a scatterplot that shows the regression line. Interpret the scatterplot.

d)  Interpret the estimated slope parameter for the regression line in the context of the problem.

e)  Interpret the coefficient of determination in the context of the problem.

f)   A student with a math score of 75 has a residual of -13.77, rounded to two decimal places. What was their observed writing score? Show your work.

4.   Model Diagnostics

a)   In the HW3 template, we provide the code to create a residual plot for the original regression of writing on math. Do you see any patterns in this plot that indicate a violation of any of our regression assumptions? Use the residual plot to comment on the assumptions related to linearity, constant variance, and the presence/absence of outliers.

b)  In the HW3 template, we provide the code to create a histogram and normal probability plot of the residuals for the regression, putting both plots in one figure. Do you think the residuals are approximately normally distributed? Explain.

c)  Do you think that math scores are   a useful variable for predicting writing scores? Explain.

5.   Models by free/reduced lunch status. In the HW3 template, we give you code that runs a regression that includes free/reduced lunch status. In the comments we tell you how to interpret the output to get different regression lines for each lunch status.

a)  Write out the two regression lines, one for each lunch status [round parameter estimates to two decimal places]. No other interpretations needed.

b)  In the HW3 template we provide the code, using ggplot, to create a scatterplot that has all points color coded by lunch status. We also provide code to add three regression lines to the plot (original regression  and two lines when taking lunch status into account). Of the two regression models (lunch status not included in model (blue line) or lunch status included in model (cyan and salmon lines)), which model do you recommend using for predicting writing scores from math scores? Explain.