闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAC67: Regression Analysis

Assignment 3

(Total: 100 points)

Please submit R Markdown ﬁle for Q. 4- Q. 5 along with your submission of the assignment.

Q.1 (24 points) Show the following statements.

(a) (4 pts) SSR (Sum of Squares of Regression) in matrix notation is:

(b) (4 pts) Show that n(1)J, H - n(1)J, and I - H are idempotent and pairwise orthogonal (i.e. the product of each pair gives 0).

\ J

(d) (4 pts) Show that is distributed as a ←n(2) -p\ degrees of freedom (e) (4 pts) Show that and are independent.

(f) (4 pts) We consider the general linear hypothesis test:

H0 : K\ α = m uw Ha : K\ α m

Q. 2 (10 points) A researcher ﬁts a multiple linear regression model, relating yield (Y) of a chemical process to temperature (x1 ), and the amounts of 2 additives (x2 and x3 , respectively). She ﬁts the following model:

y(e)

E(Y) = α0 + α1x1 + α2x2 + α3x3

She wishes to test the following three hypotheses simultaneously:

● The mean response when x1 = 70ì x2 = 10ì x3 = 10 is 80

● The average yield increases by 4 units when temperature increases by 1, controlling for x2 and x3

● The partial eﬀect of increasing each additive is the same (controlling for all other factors)

(a) Specify following matrix and vectors that she is testing (this is her null hypothesis):

H0 : K\ - = ．(、) ÷

(b) She obtains the following results from ﬁtting the regression based on n = 24 measurements while conducting the experiment:

(K\ ← - m)\ (K\ (x\ x)-1 K)-1 (K\ ← - m) = 1800; Y\ (I - H)Y = 7800

Q. 3 (20 points) Suppose that X is a categorical variable with 3 levels (A, B, C)

and we deﬁne the indicator variable I1 and I2 as:

I1 = I2 =

For a continuous response variable Y consider ﬁtting the linear model Y = 0 + 1 I1 + 2 I2 + :

We take a total sample of n individuals. Let nA , nB , nC be the number of individuals in each category of X and let A ; y¯B ; y¯C be the sample means of Y for individuals in each category of X

(a) (5 pts) Find x\ x and x\ Y

ˆ0 = y¯C ; ˆ1 = y¯A - y¯C ; ˆ2 = y¯B - y¯C :

using both options (each option is 5 points each)

(option 1) ˆ = (X X)t-1Xy.

(option 2) For any parameter values 0 ; 1 ; 2 we therefore need to min- imize the sum of squared errors

S( 0 ; 1 ; 2 ) = (yi - 0 - 1 I1i - 2 I2i)2 :

i=1

(c) (5 pts) Let sA(2); sB(2); sC(2) be the usual sample standard deviations of Y for indi- viduals in each category of X . Show that the error sum of squares can be written as

SSE = (nA - 1)sA(2) + (nB - 1)sB(2) + (nC - 1)sC(2)

Q. 4 (20 points) The public health department wished to study the relation between the average estimated probability of acquiring an infection in the hospital (infections, in percent; higher is worse) and the average length of stay of all patients in hospital (StayLength in days, X1 ), the average age of patients (Age, in years, X2 ), the average number of beds in hospital during study period (Beds, X3 ). The data ﬁle, ”Infectons.csv” can be found in Quercus. Please ignore the other three variables (MedSchool, Region,and Nurses) for this question.

(a) (4 pts) Obtain the scatter plot matrix and the correlation matrix. Interpret these and state your principal ﬁndings. Is there any concern about multi- collinearity?

(b) (4 pts) Fit regression model for three predictor variables to the data and state the estimated regression function. How is αˆ2 interpreted here?

(c) (4 pts) Test whether there is a regression relation; use ~ = 0(05. State the alternatives, decision rule, and conclusion. What does your test imply about α1 , α2 , and α3 ? What is the P-value of the test?

(d) (4 pts) Calculate the coeﬃcient of determination, and also adjusted coeﬃcient of determination. What does it indicate here?

(e) (4 pts) Obtain a 90 % prediction interval for a new hospital infection rate when StayLength = 10, Age = 45, and Beds = 150. Interpret your prediction interval.

Q. 5 (26 pts) We will use the same dataset, “Infections.csv” in Question 4 for this question. Following are the description of variables that will be used:

● Infections (Y): the average estimated probability of acquiring an infection in the hospital, in percent; higher is worse

● Beds: the average number of beds in hospital during study period

● Region: geographic region (NE = Northeast, NC = North Central, S = South, W = West)

(a) (8 pts) Write down the full model with the interaction terms. Fit the full model in R. Compute the estimated regression functions for geographic region and plot them.

(b)(4 pts) Test whether the slopes relating the average number of beds to infections are the same for each geographic region at the ~ = 0(05, signiﬁcance level.

(d) (6 pts) For the model you chose in (c), check and comment on the standard assumptions for regression model.

(e) (6 pts) Look for the transformation of Y and/or X (=Beds). Fit the regres- sion with the transformed variable(s) without interaction and comment whether this model ﬁts better.

2022-11-11

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言