Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 3701 Homework 6 – Spring 2022

This homework is due on Friday April 15 at 11:59pm. Point values are given in parentheses for each part of each question.  Submit your solutions in a pdf document on Canvas. Include your R code (which must be commented and properly indented) in the pdf file. Copying code from websites is not permitted.  Cite all sources (including lecture notes).  Show all of the steps that you took to solve each problem. Please name the pdf file <your  last  name>-HW6 .pdf. Please also submit one text file with your R code, which must be commented and properly indented.

1.  (10 points) Write an R function called gen .design that randomly generates a design matrix X with n rows and p columns.  Set xi1  = 1 and generate (xi2 , . . . ,xip) as a realization of (Xi2 , . . . ,Xip) where

Xij  = µ + Ai+ Zij ,    (i,j) ∈ {1, . . . ,n} × {2, . . . ,p},

where µ ∈ R; A1 , . . . ,An  are independent copies of the random variable σX 12ρ (U − 0.5)

 

independent.

Then E(Xij) = µ and var(Xij) = σX(2) for (i,j) ∈ {1, . . . ,n}×{2, . . . ,p}. Also, cor(Xik,Xim) = ρ when k  m.

This function has five arguments:

❼ n, the number of rows;

❼ p, the number of columns;

❼ rho, the value of ρ;

❼ sigma .X, the value of σX ;

❼ mu, the value of µ .

This function returns an n row by p column design matrix.

2. Let y = (y1 , . . . ,yn)be the measured response for n subjects. Assume that y is a realization of

Y = Xβ + (ϵ1 , . . . ,ϵn),                                                    (1)

where X is a design matrix, with n rows and p columns, generated from the model described in problem 1; β = (β1 , . . . ,βp)is the vector of unknown regression coefficients; and ϵ 1 , . . . ,ϵn are iid N(0,σ2 ).

Set p = 5, σ = 1, σX  = 1, µ = 0, and β = (1, 1, 0, 0,β5 ). This problem performs simulation studies for the test

H0  :β3 = β4 = β5 = 0

Ha  :H0  is false,

at the 1% significance level.  Always use 10,000 independent replications to compute each simulation-based estimate.


(a)  (10 points) Set β5  = 0.5 and for each ρ ∈ {0.3, 0.9}, use simulation to find the sample

size n so that the simulated estimate of the power of the test above is roughly 80%. (Note: There are two sample sizes to find, one for each value of ρ).

(b)  (10 points) Set β5 = 0.5, n = 315, and ρ = 0.9. Report the following:

i. a simulated estimate of the probability that the AIC is smaller for the full model H0Ha than for the null model H0  (i.e. the probability that AIC makes the correct choice);

ii. a simulated estimate of the probability that the BIC is smaller for the full model H0Ha than for the null model H0  (i.e. the probability that BIC makes the correct choice);

iii. a simulated estimate of the probability that the F test (at the 1% significance level) makes the correct choice.

Compare the performances of these three model-selection procedures.

(c)  (5 points) Set n = 315, β5 = 0, and ρ = 0.9. Report each of the following estimates:

i. a simulated estimate of the Type I error probability of the F-test of the hypotheses above;

ii. a simulated estimate of the probability that the AIC is smaller for the full model H0∪Ha than for the null model H0  (i.e. the probability that AIC makes the incorrect choice);

iii. a simulated estimate of the probability that the BIC is smaller for the full model H0∪Ha than for the null model H0  (i.e. the probability that BIC makes the incorrect choice).

Compare the performances of these three model-selection procedures.

3. Let y = (y1 , . . . ,yn)be the measured response for n subjects. Assume that y is a realization of

Y = Xβ + (ϵ1 , . . . ,ϵn),                                                    (2)

where X is a design matrix, with n rows and p columns; β = (β1 , . . . ,βp)′  is the vector of unknown regression coefficients; and

ϵi = τZi1 + 1 τZi,    i = 1, . . . ,n,

where τ = 0.5 + 0.25 γ 2 , γ  [0, 0.5], and Z0,Z1 , . . . ,Zn  are iid N(0,σ2 ).  Then ϵi  

2.

Set n = 30, p = 2, σ = 2, and β = (1, 5).  Create the design matrix X so that xi1  = 1 and xi2 = i/30 for i = 1, . . . , 30.

(a)  (7 points) Perform a simulation study that confirms that E(βˆ2 β2 ) = 0 when γ = 0.45.

(b)  (8 points) For each γ  ∈ {0, 0.25, 0.49}, perform a simulation study that computes a

simulation-based 99% score approximate confidence interval for the coverage probability of the random 95% approximate confidence interval for β2 :

βˆ2± t0.975,28Sϵ [(XX)1]22 .


Use 10,000 replications. What happens to the coverage probability as γ = cor(ϵi,ϵi1) increases?