STAC67: Regression Analysis Assignment 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAC67: Regression Analysis
Assignment 2
Q. 1 (20 pts) This question is to practice R to generate fake data simulation from the re- gression model. Use the “vote.txt” data in Assignment 1.
When you generate a random number, use R code, set.seed(your student number) before the R codes of generating a random number, so that we can replicate the result.
We start by assuming true regression parameters in the model. Thus, we assume that Yi = 46.3 + 4Xi + ∈i , with ∈i ~ N(0, 3.92 ). We use the predictors X (growth) that we already have from ”vote.txt”.
● Step 1: Simulation of the fake data
Simulate a vector Y of fake data and put this in a data frame with the same X (growth).
● Step 2: Fitting the model and keeping the estimated regression coeffi- cients.
● Step 3: Repeating Step 1 and Step 2, 10,000 times.
(a) (5 pts) Do Step 1 and Step 2. Obtain the least square estimates of β0 and β1 with the fake data.
Also, compute estimated E(Y |X0 = 0.1) and obtain 95% confidence in- terval for E(Y |X0 = 0.1) by hands and compare it by R built-in function.
(b) (10 pts) Do Step 3. Make a histogram of 10,000 βˆ0 and 10,000 βˆ1 . Su- perimpose (overlay) its theoretical distribution on each histogram. Cal- culate the mean and standard deviation of 10,000 estimates each. Are the results consistent with theoretical values?
(c) (5 pts) Do Step 3. Generate 10,000, 95% confidence interval for E(Y |X0 = 0.1). What proportion of the 10,000 confidence intervals for E(Y |X = 0.1) includes E(Y |X = 0.1)? Is this result consistent with theoretical expressions?
Q. 2 (15 pts) This question is to practice R to build a R function.
Use a following simple dataset, build a box cox transformation function in R (follow the steps described in the lecture note) and compare the result with the built-in R function.
x <- c(0:9)
y <- c(98, 135, 162, 178, 221, 232, 283, 300, 374, 395)
Q. 3 (30 pts) (5 pts each) The dataset “kidiq.csv” is posted at Quercus. It contains chil- dren’s test scores (Y = kid.score) and mother’s IQ scores (X = mom.iq). The data is from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth). We fit a regres- sion model predicting cognitive scores of preschoolers given their mothers’ IQ scores.
(a) Fit a Simple Linear Regression relating test scores (Y) to mother’s IQ scores (X) using R. Construct 95 % confidence interval for the mean test scores of all kids with their mother’s IQ score = 110. Compute it by hands (use R) and compare the result with the built-in R function, predict().
(b) Construct a 99% prediction interval for a new kid’s test score when his or her mother’s IQ score = 110. Compute it by hands (use R) and compare the result with the built-in R function.
(c) Plot the residuals versus fitted values. Comment on the plot.
(d) Obtain a normal probability plot of residuals and test the hypothesis that the errors are normally distributed with the Shapiro-Wilk test. Comment on the graph and test result with α = 0.05.
(e) We would like to conduct the Breusch-Pagan test to determine whether or not the error variance varies with the level of X. Install the package, ”lmtest”, and use the following R codes:
> library(lmtest)
> bptest(lm_object)
What is your test result with α = 0.05?
(f) If there is evidence of non-normality or non-constant variance of errors, obtain a Box-Cox transformation (use the built-in function), and repeat the previous parts (d) and (e).
Q. 4 (20 pts) (5 pts each) A simple linear regression was fit, relating the modulus of a tire (Y) to the amount of weeks (X) heated at 125 Celsius, with results given below:
Xi (Weeks): 0 1 2 4 6 15
Yi (Modulus): 2.3 4.2 5.2 5.9 6.3 7.2
Use the simple linear regression in matrix form.
(a) Obtain the design matrix X and Y
(b) Obtain the vector of estimated regression coefficients, , and the vector入
入 入.
(c) Compute the estimated variance-covariance matrix of , V βˆ). 入 入
(d) Find the hat matrix H. What does hii equal? Here, hij is the element in H in the ith row and jth column.
(e) Find the estimated variance-covariance matrix of the residual vector,
ar(e).
入
Q. 5 (15 pts) (5 pts each) An engineer is interested in the relationship between steel thickness (X) and its breaking strength (Y). She obtains the following matrices from a matrix computer package:
X′ X = ┐ X′ 入(Y) = ┐ 入(Y)′ (I · H)入(Y) = 20, 入(Y)′ (H · J)入(Y) = 250
(a) Construct the ANOVA table based on this information. (b) Provide 95% confidence interval for β1 .
(c) Test H0 : β1 = 0 vs β1 0 with α = 0.05.
2022-02-26