Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 1 Assessment, 2021

MAST30025 Linear Statistical Models

Question 1 (10 marks)

Let A = 4(5)   5(4) and B = 10(5)   8(4) .

(a) [3 marks] Show that A is positive definite.

[3 marks] Find all possible values of c such that c(A | I) is idempotent.

[2 marks] Find a conditional inverse for B .

[2 marks] Let ψ = (yA , y2 )c . Find

Question 2 (13 marks)

Let

ψ = '(┌) y(y)2(A) '(┐) MVN .(╱) '(┌) 2(4) '(┐) ,

and # = (yA , y2 )c MVN (u, V).

(a) [2 marks] Find u and V .

1 ' 0

|1

2

0

[3 marks] Describe the distribution of .


[4 marks] Describe the distribution of (yA(2) + y2(2) + yAy2 ).


[4 marks] Describe the distribution of (yA(2) + y2(2) + yAy2 ) + y5(2) .


Question 3 (17 marks)

In this question, we study the number of sales of hot dogs in a stadium over 20 days.  This dataset contains the variables:

sale (y): the number of hot dogs sold on the day

❼ temp (xA ): the maximum temperature on the day

❼ nppl (x2 ): the number of people in the stadium on the day (in thousands).

The data are stored in the saledata data frame and the following R calculations performed:

>  X  =  cbind(rep(1,n),  saledata$temp,  saledata$nppl)

>  y  =  saledata$sales

>

>  t(X)%*%X


[,1]      [,2]  [,3]

[1,]  20.0      -4.8  33.8

[2,]  -4.8  1104.8    6.6

[3,]  33.8        6.6  70.0

>  solve(t(X)%*%X)


[,1]          [,2]        [,3]

[1,]    0.278    0.00202  -0.1345

[2,]    0.002    0.00092  -0.0011

[3,]  -0.134  -0.00106    0.0794


>  t(X)%*%y


[,1]

[1,]    319

[2,]  2381

[3,]    661


>  t(y)%*%y


[,1]

[1,]  11271


>  qt(0.975,15:20)

[1]  2.131  2.120  2.110 >  qf(0.95,1,15:20)      [1]  4.543  4.494  4.451 >  qf(0.95,2,15:20)      [1]  3.682  3.634  3.592 >  qf(0.95,3,15:20)      [1]  3.287  3.239  3.197


2.101  2.093  2.086


4.414  4.381  4.351


3.555  3.522  3.493


3.160  3.127  3.098


[2 marks] Calculate the least squares estimates of the parameters.


[2 marks] Calculate the sample variance s2 .


[2 marks] Calculate a 95% confidence interval for the parameter corresponding to temp.


[3 marks] Test for model relevance at the 5% significance level.


[4 marks] Consider the linear model without an intercept,

y = β AxA + β2x2 + ε.

Calculate the least squares estimates of the parameters.


[4 marks] Test the hypothesis H0  : β0  = 0 in the full model (including the intercept), using an F-test at the 5% significance level.


Question 4 (8 marks)

Consider the general full rank linear model ψ = Xβ + e. For given inputs * , we wish to predict the difference of two responses with these inputs,

y A(*) = (* )c β + ε A(*) ,

y2(*) = (* )c β + ε2(*) ,

where ε A(*)   ≠  N (0, σ2 ), ε2(*)   ≠  N (0, σ2 ) and is independent of ε A(*).   Let ì be the least squares estimator of β and s2  be the sample variance.

[1 mark] Write down the predictor of y A(*) | y2(*) .


[2 marks] Find the distribution of the prediction error.


[3 marks] Formulate a t-distributed random variable based on the prediction error, and specify its degrees of freedom.


[2 marks] Based on (c), construct a 100(1 | α)% prediction interval for y A(*) | y2(*) .


Question 5 (16 marks)

A small university collected data on the salary of its faculty members in the early 1980s.  The dataset contains the following variables:

degree: Highest qualification (Masters or PhD)

❼ rank: Faculty position (Asst, Assoc, or Prof)

❼ sex: Gender (Male or Female)

❼ year: Years in current rank

ysdeg: Years since highest degree

❼ salary: Current salary ($k/yr)

We wish to model the salary of the faculty members in terms of the other variables.   The following R calculations are produced:

> model  <- lm(salary ~ ., data=salary)

> model2  <- lm(salary ~ . * sex, data=salary)

>  anova(model, model2)


Analysis  of  Variance  Table


Model  1:  salary  ~  degree  +  rank  +  sex  +  year  +  ysdeg

Model  2:  salary  ~  (degree  +  rank  +  sex  +  year  +  ysdeg)  *  sex

Res.Df       RSS  Df  Sum  of  Sq           F  Pr(>F)

1         45  258.86

2         40  241.45    5        17.406  0.5767  0.7174

>  summary(model)


Call:

lm(formula  =  salary  ~  .,  data  =  salary)


Residuals:

Min            1Q   Median           3Q         Max

-4.0452  -1.0947  -0.3615    0.8132    9.1931


Coefficients:

Estimate  Std.  Error  t  value  Pr(>|t|)

(Intercept)  15.74605        0.80018    19.678    < 2e-16 ***

degreePhD        1.38861        1.01875      1.363        0.180

rankAssoc        5.29236        1.14540     4.621  3.22e-05  ***

rankProf        11.11876        1.35177      8.225  1.62e-10  ***

sexFemale        1.16637        0.92557      1.260        0.214

year                 0.47631        0.09491      5.018  8.65e-06  ***

ysdeg             -0.12457        0.07749    -1.608        0.115

---

Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1


Residual  standard  error:  2.398  on  45  degrees  of  freedom              Multiple  R-squared:    0.855,               Adjusted  R-squared:    0.8357 F-statistic:  44.24  on  6  and  45  DF,   p-value:  < 2.2e-16

> par(mfrow=c(2,2))

> plot(model)


Residuals vs Fitted


Normal Q−Q

20         25         30         35

Fitted values




Scale−Location

20         25         30         35

Fitted values

24

2

−2 −1 0        1        2

Theoretical Quantiles


Residuals vs Leverage

24

0.5

29

0.00         0.10         0.20         0.30

Leverage



> model3  <- step(model)


Start:    AIC=97.46

salary  ~  degree  +  rank  +  sex  +  year  +  ysdeg


- sex ne>

- degree

- ysdeg

- year

-  rank

Df

1

1

1

1

2

Sum of Sq 9.13

10.69 14.87 144.87 399.79

RSS 267.99 258.86 269.55 273.73 403.73 658.65

AIC 97.265 97.462 97.566 98.366 118.574 142.025


Step:    AIC=97.27

salary  ~  degree  +  rank  +  year  +  ysdeg



- degree

- ysdeg <none>

- year

-  rank

Df 1 1

1

2

Sum of Sq 6.68 7.87

147.64 404.11

RSS 274.68 275.87 267.99 415.64 672.10

AIC 96.547 96.771 97.265 118.085 141.077