闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment/Quiz 2 Questions

1. Assume a Gaussian linear model:

Y ∼ N(Xβ,σ2 In ),

where X ∈ Rn ×p , β ∈ Rp , and σ 2 are fixed/given matrix, vector, and scalar, respectively.

(a) Write down the joint pdf: g(y |β,σ2 , X).

(b) Compute the maximum likelihood estimate of β:

= argminlng(y |β,σ2 , X).

2 = argminlng(y | ,σ 2 , X).

2. Suppose that Y ∼ N(Xβ,σ2 In ), where X ∈ Rn ×p , β ∈ Rp , and σ 2 are fixed/given matrix, vector, and scalar, respectively. Let

= X+ Y

2 = ∥Y − X∥ /(n2 − T),

where T = rank(X). Show that:

(a)

E = β ,

when rank(X) = T = p.

solution: Since X is full rank we have

X+ = (X⊤X)+ X⊤ = (X⊤X)−1 X⊤

Hence, E = X+ Xβ = β .

(b)

E2 = σ 2 ,

using the identity EX⊤AX = µ⊤Aµ + tr(AVar(X)).

solution: For this part using the projection property (I − XX+ )2 =

(n − r)E2 = E∥Y − X∥2

= E∥Y − XX+ Y ∥2

= E∥(I − XX+ )Y )∥2

= EY⊤ (I − XX+ ) Y2

= EY⊤ (I − XX+ )Y

= E[Y]⊤ (I − XX+ )E[Y] + tr((I − XX+ )Var(Y )) = E[Y]⊤ (I − XX+ )Xβ + σ tr(I2 − XX+ )

Therefore using

(I − XX+ )X = X − XX+ X = X − X = 0

we obtain

(n − r)E2 = σ tr(I2n − XX+ ) = σ (n2 − r),

because using the SVD we can write

X+ X = (USV⊤ )+ USV⊤

= (VS+ U⊤ )USV⊤

= VS+ SV⊤

= V [0(Ir) 0(0)] V⊤ .

and hence

tr(XX+ ) = tr(X+ X) = tr [V [0(Ir) 0(0)] V⊤] = r

If X is full rank, then this is simpler to write, because

tr(XX+ ) = tr(X+ X) = tr((X⊤X)−1 X⊤X) = tr(Ip ) = p = r. In summary, an unbiased estimator of σ 2 is

2 = ∥Y − XX+ Y ∥2 .

n − p

Recall that in first year you are taught that

(c)

X+ (X+ )⊤ = (X⊤X)+ .

solutions: We prove this by simply substituting and using X+ = (X⊤X)+ X⊤ :

X+ (X+ )⊤ = (X⊤X)+ X⊤ ((X⊤X)+ X⊤ )⊤

= (X⊤X)+ X⊤X[(X⊤X)+]⊤

= (X⊤X)+ X⊤X(X⊤X)+

= A+ AA+ = A+

= (X⊤X)+

(d)

[Y X] ∼ N ([X+o(X)β] , σ 2 [(X0(⊤X))+ In − X(0)X+])

Hence, deduce that is independent of ∥Y − X∥2 .

[X(X)2(1)] ∼ N ] , ]) .

We know that Y is multivariate Gaussian and any linear transformation of a Gaussian variable yields another multivariate Gaussian. Thus, from

[Y X] = [石In X+]别Y

we can conclude that [Y X] is multivariate Gaussian with mean [In X(+)X+] E[Y] = [In X(+)X+] Xβ = [X+o(X)β] .

The covariance is:

Var(AY ) = σ AA2 ⊤

= [In X(+)X+] [(X+ )⊤ In − XX+J

= [(In X(+)X(⊤)+ )⊤ I(+) X(X)2(+))]

= [(X0(⊤X))+ (In − X(0)X+ )]

Therefore,

Var() = σ2 (X⊤X)+

Var(Y − X) = σ (I2 − XX+ )

Cov( , Y − X) = 0

Since and Y − X are jointly normal with zero covariance, then they

other words, and 2 are independent. This is going to be used in the

next part.

(e) If T = p, then

σ 2 = σ 2 ∼ χn −r .

solution: We know that

X ⊤ Σ−1 X ∼ χr(2) ,

where X ∼ N(0, Σ).

In particular, we have the quadratic form

Y ⊤ (In − XX+ )Y /σ2 = Y⊤ (In − XX+ )(In − XX+ )+ (In − XX+ )Y /σ2

= Z⊤ (In − XX+ )+ Z ,

where (Y ∼ N(Xβ,σ2 In ))

Z = (In − XX+ )Y /σ ∼ N(o, In − XX+ ).

example,

E[Z] = (In − XX+ )E[Y]/σ = (In − XX+ )Xβ/σ = o

and

Var(Z) = In − σ(X)X+ Var(Y )In − σ(X)X+ = In − σ(X)X+ σ I2n In − σ(X)X+ = In −XX+ .

Therefore,

Z⊤ (In − XX+ )+ Z ∼ χn(2)−r .

(f) If r = p, then

βˆj − Eβˆj = (βˆj − Eβˆj )/(σ∥ei(⊤)X+ ∥) ∼ tn −r .

solution: From part 5 we know that

(n − r)2 /σ 2 ∼ χn(2)−r

From part 4, we know that 2 is independent of ∼ N(E,σ 2 (X⊤X)+ ),

so that

βˆj − E[βˆj ] ∼ N(0, 1)

σ ^[(X⊤X)+]jj

From Quiz 1 sheet, we have that

βˆj −E[βˆj]

σ ^[(X⊤X)+ ]jj

In other words, we have

^ X(βj)+]jj ∼ tn −p .

From Part 3, we know that

[(X⊤X)+]jj = ej(⊤)(X⊤X)+ ej = ej(⊤)X+ (X+ )⊤ ej = ∥ ej(⊤)X+ ∥2 .

3. For the simple linear regression Y = β0 + β1 x + ϵ, show that R2 is the same as the sample correlation between the response and the explanatory variable:

R2 = (对i (yi − y¯)(xi − ))2

对i (yi − y¯)2 对i (xi − )2.

solution: First, from the definition in the notes on page 39, Section 2.3.7, we know that

R2 = ∥y()12 = y(yˆ) y¯(y¯)1(1)2(2) = 对i ( y(i)i(+)) ~~y¯)~~2

For a simple linear regression, we know that

b0 = y¯ − b1

b1 = 对i (xi − )(yi − y¯)

对i (xi − )2

Substituting these gives:

对i (b1 xi + y¯ − b1 − y¯)2 = 对i (b1 [xi − ])2

对i (yi − y¯)2 对i (yi − y¯)2

= b1(2) 对i [xi − ]2

对i (yi − y¯)2

= (对i (xi − )(yi − y¯))2

对i (yi − y¯)2 对i (xi − )2,

which completed the proof.

4. Show that Ra(2)djusted ≤ R2 .

solution: We recall that

Ra(2)djusted = 1 − (1 − R )2 n − 1

Therefore,

When n > p ≥ 1 we have that

≥ 1.

Therefore,

1 − Ra(2)djusted n − 1

1 − R2 n − p

We conclude,

(1 − R )2 ≤ 1 − Ra(2)djusted

Hence,

Ra(2)djusted ≤ R2 ,

which makes sense, because Ra(2)djusted is less optimistic about the model than R2 (higher R2 means better fit to the data, possibly overfitting).

5. For the diabetes dataset, compute the 2-fold cross-validation loss as an estimate of the expected generalization risk of the linear learner. Report the numerical value.

solution: Without reordering the data we get: 3250.9

6. For the diabetes dataset, compute the leave- one- out cross-validation loss (the PRESS statistic divided by n) as an estimate of the expected generalization risk of the linear learner, and report the numerical value.

Perform the computation of the leave- one- out cross-validation in two different ways: 1) one using the fast PRESS statistic formula; 2) another using a brute force retraining of the linear learner.

solution: The value for n-fold CV is: 3147

7. For the diabetes dataset, use the estimate

∥(In − XX+ )Y ∥2 2σ2p

where σ 2 ≈ 3000, of the in-sample risk to decide if the following predictors should be jointly included/excluded in the linear model: age,glu,tch,ldl?

After making your decision about which features to include in the model matrix

X, then estimating the corresponding coefficients , create a qq-plot of the residuals y − X .

soln: in-sample risk estimate using all of the predictors is 3137.9 (here p in- cludes the constant feature); in-sample risk estimate after removing the pre- dictors is 3088.5; Thus, we prefer dropping these predictors.

The coefficients estimated after dropping the predictors are:

152.43, −233.31, 576.45, 287.26, −171.16, −197.03, 620.38

The residuals look like this

and a qqplot looks like this:

8. For the diabetes dataset, compute a 95% numerical confidence interval for βj that corresponds to the predictor “age” .

answer approximately [-381 , -86]

9. Download file risk .csv. The goal is to predict risk from the other variables. Do an F-test to check if the explanatory variables are all jointly relevant and report the R2 .

10. Here we use fish .csv dataset. The variables weight (in grams) and length (in millimetres) in this data set are the lengths and weights of 23 different catfish captured in the Kanawha River in Charleston, West Virginia. It was desired to estimate the angler harvest of channel catfish, and for live fish length is much easier to measure than weight. Hence it was of interest to study the length/weight relationship for channel catfish.

(a) Train a simple linear regression model with weight as response and length as predictor.

(b) It is conjectured that the weight of a fish varies with length by the follow- ing relationship:

log10 (Y) ≈ β0 + β1 log10 (x) + ϵ,

where y is the weight and x is the length.

Train a simple linear regression model with log-weight as response and log-length as predictor.

(c) Plot a scatterplot and estimate the generalization risk for both models. Explain which mode is preferable based on the scatterplot and gen. risk. Assume interest is in prediction of y in natural units (not in log units).

answer: The coefficient on the raw data is: b = [ −884.31, 3.8444]⊤ . The