闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAC67: Regression Analysis

Assignment 2

Q. 1 (20 pts) This question is to practice R to generate fake data simulation from the re- gression model. Use the “vote.txt” data in Assignment 1.

When you generate a random number, use R code, set.seed(your student number) before the R codes of generating a random number, so that we can replicate the result.

We start by assuming true regression parameters in the model. Thus, we assume that Yi = 46.3 + 4Xi + ∈i , with ∈i ~ N(0, 3.92 ). We use the predictors X (growth) that we already have from ”vote.txt”.

● Step 1: Simulation of the fake data

Simulate a vector Y of fake data and put this in a data frame with the same X (growth).

● Step 2: Fitting the model and keeping the estimated regression coeﬃ- cients.

● Step 3: Repeating Step 1 and Step 2, 10,000 times.

(a) (5 pts) Do Step 1 and Step 2. Obtain the least square estimates of β0 and β1 with the fake data.

Also, compute estimated E(Y |X0 = 0.1) and obtain 95% conﬁdence in- terval for E(Y |X0 = 0.1) by hands and compare it by R built-in function.

(b) (10 pts) Do Step 3. Make a histogram of 10,000 βˆ0 and 10,000 βˆ1 . Su- perimpose (overlay) its theoretical distribution on each histogram. Cal- culate the mean and standard deviation of 10,000 estimates each. Are the results consistent with theoretical values?

(c) (5 pts) Do Step 3. Generate 10,000, 95% conﬁdence interval for E(Y |X0 = 0.1). What proportion of the 10,000 conﬁdence intervals for E(Y |X = 0.1) includes E(Y |X = 0.1)? Is this result consistent with theoretical expressions?

Q. 2 (15 pts) This question is to practice R to build a R function.

Use a following simple dataset, build a box cox transformation function in R (follow the steps described in the lecture note) and compare the result with the built-in R function.

x <- c(0:9)

y <- c(98, 135, 162, 178, 221, 232, 283, 300, 374, 395)

Q. 3 (30 pts) (5 pts each) The dataset “kidiq.csv” is posted at Quercus. It contains chil- dren’s test scores (Y = kid.score) and mother’s IQ scores (X = mom.iq). The data is from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth). We ﬁt a regres- sion model predicting cognitive scores of preschoolers given their mothers’ IQ scores.

(a) Fit a Simple Linear Regression relating test scores (Y) to mother’s IQ scores (X) using R. Construct 95 % conﬁdence interval for the mean test scores of all kids with their mother’s IQ score = 110. Compute it by hands (use R) and compare the result with the built-in R function, predict().

(b) Construct a 99% prediction interval for a new kid’s test score when his or her mother’s IQ score = 110. Compute it by hands (use R) and compare the result with the built-in R function.

(d) Obtain a normal probability plot of residuals and test the hypothesis that the errors are normally distributed with the Shapiro-Wilk test. Comment on the graph and test result with α = 0.05.

(e) We would like to conduct the Breusch-Pagan test to determine whether or not the error variance varies with the level of X. Install the package, ”lmtest”, and use the following R codes:

> library(lmtest)

> bptest(lm_object)

What is your test result with α = 0.05?

(f) If there is evidence of non-normality or non-constant variance of errors, obtain a Box-Cox transformation (use the built-in function), and repeat the previous parts (d) and (e).

Q. 4 (20 pts) (5 pts each) A simple linear regression was ﬁt, relating the modulus of a tire (Y) to the amount of weeks (X) heated at 125 Celsius, with results given below:

Xi (Weeks): 0 1 2 4 6 15

Yi (Modulus): 2.3 4.2 5.2 5.9 6.3 7.2

Use the simple linear regression in matrix form.

(a) Obtain the design matrix X and Y

(b) Obtain the vector of estimated regression coeﬃcients, , and the vector入

入入.

(d) Find the hat matrix H. What does hii equal? Here, hij is the element in H in the ith row and jth column.

(e) Find the estimated variance-covariance matrix of the residual vector,

ar(e).

入

Q. 5 (15 pts) (5 pts each) An engineer is interested in the relationship between steel thickness (X) and its breaking strength (Y). She obtains the following matrices from a matrix computer package:

X′ X = ┐ X′ 入(Y) = ┐ 入(Y)′ (I · H)入(Y) = 20, 入(Y)′ (H · J)入(Y) = 250