MAST30025 Linear Statistical Models Semester 1 Assessment, 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Semester 1 Assessment, 2021
MAST30025 Linear Statistical Models
Question 1 (10 marks)
Let A = ┌ 4(5) 5(4) ┐ and B = ┌ 10(5) 8(4) ┐ .
(a) [3 marks] Show that A is positive definite.
[3 marks] Find all possible values of c such that c(A | I) is idempotent.
[2 marks] Find a conditional inverse for B .
[2 marks] Let ψ = (yA , y2 )c . Find
Question 2 (13 marks) Let ψ = '(┌) y(y)2(A) '(┐) ≠ MVN .(╱) '(┌) 2(4) '(┐) , and # = (yA , y2 )c ≠ MVN (u, V). (a) [2 marks] Find u and V . |
┌ 1 ' 0 |
|1 2 0 |
|
[3 marks] Describe the distribution of ┐ .
[4 marks] Describe the distribution of (yA(2) + y2(2) + yAy2 ).
[4 marks] Describe the distribution of (yA(2) + y2(2) + yAy2 ) + y5(2) .
Question 3 (17 marks)
In this question, we study the number of sales of hot dogs in a stadium over 20 days. This dataset contains the variables:
❼ sale (y): the number of hot dogs sold on the day
❼ temp (xA ): the maximum temperature on the day
❼ nppl (x2 ): the number of people in the stadium on the day (in thousands).
The data are stored in the saledata data frame and the following R calculations performed:
> X = cbind(rep(1,n), saledata$temp, saledata$nppl)
> y = saledata$sales
>
> t(X)%*%X
[,1] [,2] [,3]
[1,] 20.0 -4.8 33.8
[2,] -4.8 1104.8 6.6
[3,] 33.8 6.6 70.0
> solve(t(X)%*%X)
[,1] [,2] [,3]
[1,] 0.278 0.00202 -0.1345
[2,] 0.002 0.00092 -0.0011
[3,] -0.134 -0.00106 0.0794
> t(X)%*%y
[,1]
[1,] 319
[2,] 2381
[3,] 661
> t(y)%*%y
[,1]
[1,] 11271
> qt(0.975,15:20)
[1] 2.131 2.120 2.110 > qf(0.95,1,15:20) [1] 4.543 4.494 4.451 > qf(0.95,2,15:20) [1] 3.682 3.634 3.592 > qf(0.95,3,15:20) [1] 3.287 3.239 3.197
2.101 2.093 2.086
4.414 4.381 4.351
3.555 3.522 3.493
3.160 3.127 3.098
[2 marks] Calculate the least squares estimates of the parameters.
[2 marks] Calculate the sample variance s2 .
[2 marks] Calculate a 95% confidence interval for the parameter corresponding to temp.
[3 marks] Test for model relevance at the 5% significance level.
[4 marks] Consider the linear model without an intercept,
y = β AxA + β2x2 + ε.
Calculate the least squares estimates of the parameters.
[4 marks] Test the hypothesis H0 : β0 = 0 in the full model (including the intercept), using an F-test at the 5% significance level.
Question 4 (8 marks)
Consider the general full rank linear model ψ = Xβ + e. For given inputs ↓* , we wish to predict the difference of two responses with these inputs,
y A(*) = (↓* )c β + ε A(*) ,
y2(*) = (↓* )c β + ε2(*) ,
where ε A(*) ≠ N (0, σ2 ), ε2(*) ≠ N (0, σ2 ) and is independent of ε A(*). Let ì be the least squares estimator of β and s2 be the sample variance.
[1 mark] Write down the predictor of y A(*) | y2(*) .
[2 marks] Find the distribution of the prediction error.
[3 marks] Formulate a t-distributed random variable based on the prediction error, and specify its degrees of freedom.
[2 marks] Based on (c), construct a 100(1 | α)% prediction interval for y A(*) | y2(*) .
Question 5 (16 marks)
A small university collected data on the salary of its faculty members in the early 1980s. The dataset contains the following variables:
❼ degree: Highest qualification (Masters or PhD)
❼ rank: Faculty position (Asst, Assoc, or Prof)
❼ sex: Gender (Male or Female)
❼ year: Years in current rank
❼ ysdeg: Years since highest degree
❼ salary: Current salary ($k/yr)
We wish to model the salary of the faculty members in terms of the other variables. The following R calculations are produced:
> model <- lm(salary ~ ., data=salary)
> model2 <- lm(salary ~ . * sex, data=salary)
> anova(model, model2)
Analysis of Variance Table
Model 1: salary ~ degree + rank + sex + year + ysdeg
Model 2: salary ~ (degree + rank + sex + year + ysdeg) * sex
Res.Df RSS Df Sum of Sq F Pr(>F)
1 45 258.86
2 40 241.45 5 17.406 0.5767 0.7174
> summary(model)
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4.0452 -1.0947 -0.3615 0.8132 9.1931
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.74605 0.80018 19.678 < 2e-16 ***
degreePhD 1.38861 1.01875 1.363 0.180
rankAssoc 5.29236 1.14540 4.621 3.22e-05 ***
rankProf 11.11876 1.35177 8.225 1.62e-10 ***
sexFemale 1.16637 0.92557 1.260 0.214
year 0.47631 0.09491 5.018 8.65e-06 ***
ysdeg -0.12457 0.07749 -1.608 0.115
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.398 on 45 degrees of freedom Multiple R-squared: 0.855, Adjusted R-squared: 0.8357 F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
> par(mfrow=c(2,2))
> plot(model)
Residuals vs Fitted
Normal Q−Q
20 25 30 35
Fitted values
Scale−Location
20 25 30 35
Fitted values
24 |
2
|
|
−2 −1 0 1 2
Theoretical Quantiles
Residuals vs Leverage
24
0.5
29
0.00 0.10 0.20 0.30
Leverage
> model3 <- step(model)
Start: AIC=97.46
salary ~ degree + rank + sex + year + ysdeg
- sex - degree - ysdeg - year - rank |
Df 1
1 1 1 2 |
Sum of Sq 9.13 10.69 14.87 144.87 399.79 |
RSS 267.99 258.86 269.55 273.73 403.73 658.65 |
AIC 97.265 97.462 97.566 98.366 118.574 142.025 |
Step: AIC=97.27
salary ~ degree + rank + year + ysdeg
- degree - ysdeg <none> - year - rank |
Df 1 1 1 2 |
Sum of Sq 6.68 7.87 147.64 404.11 |
RSS 274.68 275.87 267.99 415.64 672.10 |
AIC 96.547 96.771 97.265 118.085 141.077 |
2022-05-28