闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 100C, Winter 2022

Final Exam

Notation: In is the n x n identity matrix. There are 6 problems.

Problem 1 (15 points) 5+5+5 = 15

Consider random vector

≠ = '(┌)'(┐) ~ N '(┌)-112'(┐) , I字 .

(a) Is z1 independent of z3 - z字 ? Justify your answer.

(b) For what value(s) of α e R, does (z1 + 2)3 + α(z3 - z字 )3 have a chi-square distribution?

Find the distribution of |P上≠|3 where P上 = I字 - P .

Hint: Write ≠ = μ + 叫 where μ is a ﬁxed vector and 叫 ~ N (0, I字 ) .

Problem 2 (10 points)

Consider a simple linear regression model

yi = β卜 + β1xi + εi, i = 1, 2, 3

with xi = i/3 for i = 1, 2, 3. Assume that

ε = '(┌)'(┐) ~ N '(┌)1-01

-1

What is the smallest variance for an unbiased estimate of β 1 ?

Problem 3 (20 points) 8+8+4 = 20 Consider a multiple linear regression model with the intercept, with n = 50 samples and p = 4 covariates, under the standard assumptions.

We ﬁt the model and obtain R3 = 0.6. Call this model M1 .

(a) Perform a signiﬁcance test of the linear relation in M1 at level α = 0.05.

Assume that we drop three of the four variables and the R3 drops to 0.5. Call this new model M3 . (b) Use an F-test at level 0.05 to choose between M1 and M3 .

. . .

(Hints: the intercept remains in the null model for part (a). You do not need to know the value of SST in this problem.)

Problem 4 (10 points)

Consider a linear regression problem with the intercept (included in the model). The model is ﬁt to the data, with sample size n = 12, and the following information is available about the leverage score and the residual of each data point:

i 1 2 3 4 5 6 7 8 9 10 11 12

hii 0.28 0.50 0.38 0.28 0.22 0.28 0.50 0.22 0.38 0.22 0.52 0.22

ei 0.67 1.10 -1.10 0.77 1.13 -0.03 -1.10 -0.67 -0.30 -0.87 -1.40 1.83 For your convenience, |i兰1(n) ei(3) = 12.554 and

i 1 2 3 4 5 6 7 8 9 10 11 12

31 - hii

1 - hii .

Which data point has the most inﬂuence on the regression line? Justify your answer. You can use any measure of inﬂuence as along as it is appropriate for the task.

Problem 5 (20 points) 5+5+5+5 = 20 Consider a multiple linear regression model 』= |j(p)兰卜 βjαj + ε e Rn . Assume that p = 4 and n = 100 and that the design matrix is full-rank.

(a) What is the eﬀect on the least-squares estimate of the coeﬃcients ( ) if we scale some of the covariates, that is, multiply each column αj of the design matrix X by some number αj e R? Justify your answer by providing some derivation.

Hint: The eﬀect of the above scaling on the design matrix X is to replace it with XD for some diagonal matrix D .

(b) What is the eﬀect on the SSE if we linearly combine covariates α 1 and α3 , as well as covariates

α字 and αA as follows

New design matrix = X1 := ┌α卜 α 1 + α3 α字 - αA ┐ e R1卜卜x字?

In other words, does the SSE of the new model (using the same response variable) go up, down, remain the same, or the information is not suﬃcient to determine what happens. You can also say, for example, that the SSE of the new model will be “less than or equal to” the original model and so on. Justify your answer.

(c) Can we think of the regression model in part (b) as a nested model relative to the original model? Justify your answer.

(d) Repeat part (b) for the following design matrix

New design matrix = X3 := ┌α卜 α 1 + α3 α 1 - α3 α字 αA ┐ e R1卜卜x3?

Problem 6 (25 points) 5+5+5+5+5 = 25 The following sample correlation matrix C among 5 variables is given

## lpsa lcavol lweight lcp lbph ## lpsa 1.0000000 0.7344603 0.4333194 0.548813175 0.179809404 ## lcavol 0.7344603 1.0000000 0.2805214 0.675310484 0.027349703 ## lweight 0.4333194 0.2805214 1.0000000 0.164537142 0.442264399 ## lcp 0.5488132 0.6753105 0.1645371 1.000000000 -0.006999431 ## lbph 0.1798094 0.0273497 0.4422644 -0.006999431 1.000000000

We take variable “lpsa” as the response 』 and consider the rest of variables as potential predictors (i.e., covariates).

(a) Among all the 2-variable linear models for “lpsa”, which model will have the smallest variance

inﬂation factor (VIF) for the estimated coeﬃcients of the predictors? Which model will have the largest VIF?

For example, one 2-variable model regresses lpsa on (lcavol, lcp}, another one regresses the same response on (lweight, lcp}, and so on. There are ╱3(A)∶ = 6 such models. Each model also includes an intercept.

(b) Suppose that we ﬁt the regression model

lpsa ~ β卜 + β1 lcavol

and the estimated coeﬃcients are

## (Intercept) ## 1.5072975

lcavol

0.7193204

We then form the residual vector from this model, call it e(1} , and ﬁt the regression model e(1} ~ γ卜 + γ1 lcavol.