闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MA 3502

Regression Analysis and Experimental Design

Spring 2020-2021

Question 1.

1a. Consider the linear regression model Y = Xθ+ε, where θ is an m-vector of unknown parameters, Y is an N-vector of observations, X is a design matrix of size N × m and ε is a vector of normal random errors with mean 0 and covariance matrix σ I2N . Here σ 2 is some positive unknown constant and IN is the identity matrix of size N × N. Assume that rank(X) = p ≤ m.

(i) Give a geometrical interpretation of the LSE, the least squares estimator (LSE) of θ . (ii) Deﬁne the lasso estimators of θ and discuss a motivation behind these estimators. (iii) Deﬁne the ridge estimator of θ and brieﬂy discuss its properties.

(iv) Let Z be a vector of size m such that the linear form ZT θ is estimable. Explain the method of constructing conﬁdence intervals for ZT θ .

(v) Derive the maximum likelihood estimator for σ 2 .

[10=2+2+2+2+2]

1b. Assume m = 3, N = 5 and the model yj = θ0 + θ1 xj + θ2 xj(2) + εj . The table below displays the design points and the corresponding results:

xj	0	u	−u	v	−v
yj	0	1	− 1	1	− 1

Here u = d4 + 1 and v = d4 + 2, where d4 is the fourth digit of your student number.

(i) Test the hypothesis that the complete regression model is statistically signiﬁcant on the 95% conﬁdence level. Write down the corresponding ANOVA table and compute R2 .

(ii) Test the hypothesis

皿 :

on the 95% conﬁdence level.

To answer this question please use 19.0 as an approximate value for the 0.95-quantile of the F-distribution with 2 and 2 degrees of freedom.

[15=7+8]

Question 2.

2a. Consider the linear regression model Y = Xθ+ε, where θ is an m-vector of unknown parameters, Y is an N-vector of observations, X is a design matrix of size N × m and ε is a vector of normal random errors with mean 0 and covariance matrix σ I2N .

In a particular application, m = 4, N = 5 and the model is Y = θ 1X1 + θ2X2 + θ3X3 + θ4X4 + ε. The table below displays the design points and the corresponding results

X1	u	-1	1	0	-1
X2	−v	1	0	1	-1
X3	u − v	0	1	1	-2
X4	u+2v	-3	1	-2	1
Y	1	1	1	-1	0

Here u = d4 + 1, v = d5 + 1 and dj is the j-th digit of your student number.

Calculate p =rank(X) and hence compute s2 = SSerror /(N − p). [8]

2b. Assume a weighted linear regression model Y = θ0 + θ1X1 + θ2X2 + ε, where random vector ε has mean 0 and covariance matrix Dε = σ W2 with σ 2 > 0, N = 5 and

、ì

W = （(（) 0 0 1 0 0 ì(ì) .

（ 0 0 0 1 0 ì

The data is summarized in the following table:

X1	0	1	1	-1	-1
X2	0	1	-1	1	-1
Y	1	2	3	4	5

where dj is the j-th digit of your student number. Compute D θˆ, the covariance matrix of θˆ= (XT X)− 1 XT Y , the standard LSE of θ. [7]

2c. In a particular application, the model is yj = α + βxj + εj and an experiment was performed at 3 diﬀerent points with diﬀerent measurements that can be considered independent and identically normally distributed. The table below displays the design points and the corresponding results:

xj	-1	0	1
yj	d3 + 1	d3 + 1	d3 + 3

where d3 is the third digit of your student number. Using the Cook’s distance ﬁnd the most inﬂuential observation(s).

[10]

Question 3.

3a. Consider a general linear regression scheme yj = η(xj , θ) + εj (j = 1, . . . , N), where η(x, θ) = θT f (x) =↓ θi fi (x), x ∈ X , and ε 1 , . . . , εN are uncorrelated random errors with E(εj ) = 0 and var(εj ) = σ 2 (j = 1, . . . , N). Let θˆ be the least squares estimator of θ .

(i) Give the deﬁnitions of an exact design and an approximate design.

(ii) Give a geometrical interpretation of the criteria of D-optimality.

(iii) Derive the relationship between the variance of the estimator θˆT f (x0 ) of η(x0 , θ), the response at a point x0 , and the covariance matrix of the design corresponding to the observations at points x1 , . . . , xN . On the basis of this derivation, deﬁne the criterion of extrapolation to the point x0 .

(iv) Deﬁne the criteria of Q-optimality and specialize this criterion for the case where X = [0, 1] and f (x) = (1, x, x2 )T .

[10=2+2+3+3]

3b. Consider the regression model yj = η(xj , θ) + εj , where η(x, θ) = θ0 + θ 1 x + θ2 x2 , θ = (θ0 , θ 1 , θ2 ), x ∈ [ − 1, 1] and ε 1 , . . . , εN are uncorrelated random errors with E(εj ) = 0 and var(εj ) = σ 2 (j = 1, . . . , N). Consider the following two designs:

, − 1

ξ1 = ．

． α

0 1

1 − 2α α

．．, − 1

．． 4

−α

1 、

．

4 ．

where α = (d4 + 1)/50 and d4 is the fourth digit of your student number.

(i) Find out whether either of these two designs design dominates the other one.

(ii) Compare these two designs with respect to the D-optimality criterion and the criterion of extrapolation at the point x0 = 0.

(iii) Check optimality of the design ξ1 with respect to the criterion of extrapolation at the point x0 = 0.

[15=5+5+5]

Question 4.

4a.

(i) What are the aliases of θCD in the incomplete polynomial model of order 3 for the 29 −3 factorial design for factors A,B,C,D,E,F,G,H,I with deﬁning contrasts ABCD, CDEF and -EGHI? What is the resolution of the design?

(ii) Draw the Fano diagram and construct the corresponding BIBD.

(iii) Let Λ 1 be a 5 × 5 Latin square constructed by the formula z = ax + by + c (mod 5), where x is row number, y is column number, z is symbol, a = 1, b = 2 and c = 3. Construct two more 5 × 5 Latin squares Λ2 and Λ3 so that the three Latin squares are pair-wise orthogonal, Λ2 has the canonical form and Λ3 is diagonal.

(iv) Consider a 34 −2 factorial design for factors A, B, C, D with deﬁning relations A + B + C = 1 (mod 3) and A + 2B + 2D = 2 (mod 3). Write down the corresponding pair of Latin squares.

分/各礻! Relate A to the row number, B to the column number and C, D to the symbols of the two Latin squares.

[14=2+2+5+5]

4b. Assume that there are 10000 units with d5 + 3 or less of them defective, where d5 is the ﬁfth digit of your student number. Assume we are using a two-stage group testing strategy to detect the defective units. What is the minimal number of experiments needed? [4]

4c. Assume that 5 coins are weighed on a spring balance (one-pan) scales. It is known before weighing that no more than 2 coins are false and that the false coins are lighter than the genuine ones. An experiment is required to determine the false coins, where the result in a particular test is 1 if there is at least one false coin in a test group and 0 otherwise. Consider the following design:

obs	coin 1	coin 2	coin 3	coin 4	coin 5
1	+	+	–	–	–
2	+	–	+	–	–
3	+	–	–	+	–

(i) Construct the output table for this design. Compute the entropy of the partition of the set of all possible answers.

(ii) Answer the question stated in (i) assuming that it is known before weighing that exactly 2 coins are false and the result in a particular test is the number of false coins in a test group.

[7]