闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Linear Regression Models, Homework 2

2022

1. Given x1 , . . . , xn ∈ R, and assume that not all of xi are the same. Recall the simple linear regression model with Normally distributed errors:

yi = β0 + xiβ1 + ∈i for i = 1, . . . , n

where ∈1 , ∈2 , . . . , ∈ n ~ i.i.d. Ⅳ (0, σ2 ). Or equivalently, we can write

( y1 ) ( 1 x1 ) ( ∈ 1 )

y..2 = 1.. x..2 !β(β)1(0) ? + ∈2. .

| yn 3 | 1 xn 3 | ∈n 3

Deﬁne

(y1) 仁(仁)y2

Y = 仁 ) ,仁(仁)

| 3

( 1 x1 )

仁(仁) 1 x2

X = 仁 ) ,

仁(仁)

| 3

( ∈ 1 )

∈ = ∈2. , β = │ \β(β)1(0)

| 3

then an expression of the normal simple linear regression model in matrix terms is

Y = Xβ + ∈, where ∈ ~ Ⅳ╱0, σ2 In、.

Let | . | be the L2-norm of an n-dimensional vector, deﬁned for a ∈ Rn来1 as: |a| = 上a1(2) + a2(2) + . . . + an(2) .

The sum of squares of errors can be expressed as Q(β) = |Y - Xβ|2 .

(a) Derive the MSE minimizer in matrix from, using only Y and X .

Hint: in matrix calculus, we have

= x十 , ∂ββ = ╱ Σ + Σ十、β .

(b) Now ﬁx a p ≥ 1. Suppose xi ∈ Rp来1 and β 1 ∈ Rp来1 are both p-dim vector, i.e. we are

using p predictor variables to predict the 1-dim response variable Y . We write

(1 仁(仁)1

X = 仁

仁(仁)

x . . . xp) ) (1 x . . . xp) 仁(仁)1

) = 仁

仁(仁)

x . . . x)3 |1

x1(十))

x2(十)

xn(十)3

and we assume that rank(X) = p + 1. Will the minimizer of Q(β) still take the same form as in (a)? Derive it.

(d) Recall the residual vector is deﬁned as e = Y - . Show e十X = 0 and e十 = 0.

(e) Let ✶n denote the n-dimensional all-one column vector. Recall the hat matrix H = X(X十X)|1X十 . Calculate the value of H . ✶n .

2. Continue with the Copier maintenance data introduced on the previous homework assignment; recall X denotes the number of copiers serviced and Y the total number of minutes spent on a service call. Assume the normal simple linear regression model is appropriate.

(a) Conduct a t-test to determine whether or not there is a linear association between X

and Y . Clearly state the null and alternative hypotheses in terms of model parameters. Report and interpret the p-value from your test.

(b) Use a 95% conﬁdence interval to estimate the change in mean service time when the

number of copiers serviced increases by one. Interpret your conﬁdence interval.

(c) The manufacturer has suggested that the mean required time should not increase by more than 14 minutes for each additional copier that is serviced on a service call. Address this question in two ways:

i. By inspection of your conﬁdence interval in part (b), and

ii. by conducting a formal signiﬁcance test of the appropriate hypotheses, reporting and interpreting the p-value from the test.

Are your conclusions consistent?

3. Continue with the SENIC Project data, but this time we consider regressing infection risk Risk against average length of stay Stay, average age of patients Age, routine chest X-ray ratio Xray (three continuous predictors), and medical school aﬃliation MS, which takes the value 1 if Yes and 2 if No.

(a) Change the aﬃliation variable MS that takes the value 1 if Yes and 0 if No.

(b) Prepare a scatterplot matrix of the response and three continuous predictor variables,

where data points corresponding to hospitals with a medical school aﬃliation are indi- cated by a diﬀerent plotting symbol. Describe the relationships among the variables.

(c) Letting Y denote the response and X1, X2 , X3 the continuous predictors and X4 the indicator variable for medical school aﬃliation, ﬁt the mean functions

E[Y] = β0 + β1x1 + β2x2 + β3x3 + β4x4

and

E[Y] = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β14x1x4 + β24x2x4 + β34x3x4

assuming constant variance and normality in both cases.

i. Explain in plain English what each of these models means exactly. (You don’t have to include this in your answer, but you should know the interpretation of every single parameter in both mean functions.)

ii. Conduct an F-test of the reduced model versus the full model, that is, a test of the null hypothesis

H0 : β14 = β24 = β34 = 0 v .s. H1 : other wise.

What is your conclusion?

(d) Working with the reduced model, estimate the eﬀect of medical school aﬃliation on infection risk using a 95% conﬁdence interval. Interpret your interval estimate.

4. (a) In a test of H0 : β 1 ≤ 0 versus Ha : β 1 > 0 we fail to reject H0, and an analyst concludes that there is no linear association between X and Y . Do you agree? Explain.

(b) The same analyst later claims that “estimating the mean response at x = x0 ” and

“predicting the mean of m new observations at x = x0 ” are essentially the same problem. Do you agree? Explain.