闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 1 Assessment, 2020

MAST30025 Linear Statistical Models

Question 1 (10 marks)

[2 marks] Give an example of a 3 × 3 idempotent matrix which is not 0 or I3 .

[2 marks] Show that the matrix A = ┌ 2(3) 3(2) ┐ is positive deﬁnite.

[3 marks] Show directly that yθ Ay = Ay + Aθy.

(d) [3 marks] Show directly that for any n × k matrix A with n 2 k, the matrix I - A(Aθ A)女Aθ has a rank of n - r(A). (You may assume that this matrix is idempotent.)

Question 2 (14 marks)

(a) [5

marks] Let X1 ～ χ本(2)l入← l + X2 ～ χ本(2)l+本2入← l+←2 .

and X2 ～ χ本(2)2 入←2 be independent. Show directly that

[3 marks] Let

y ～ MVN ╱┌ -83 ┐ , ┐\ , A = ┐ .

Calculate E[yθ Ay].

[3 marks] Describe the distribution of 3y1 - 2y2 .

[3 marks] Find all values of a and b for which ay1 + by2 is independent of 3y1 - 2y2 .

Question 3 (18 marks)

The international bank UBS produced a report on prices and earnings in major cities throughout the world. One of the variables that they measured was the price of 1kg of rice, measured in minutes of labour required for a “typical” worker to purchase the rice. This was measured in 2003 (rice2003) and again in 2009 (rice2009).

We wish to model the 2009 price in terms of the 2003 price, using the linear model y = X8 +e. The following R calculations are performed:

> UBS <- read.csv(✬UBSprices.csv✬, header=T)

> plot(UBS$rice2003, UBS$rice2009)

●

● ●

● ● ● ●

●

UBS$rice2003

> plot(log(UBS$rice2003), log(UBS$rice2009))

●

● ●

●

● ●

● ● ●

●

● ●●● ●

● ● ●

●

● ●

●

1.5 2.0 2.5 3.0 3.5 4.0 4.5

log(UBS$rice2003)

> (n <- length(UBS$rice2009))

[1] 54

> X <- cbind(1, log(UBS$rice2003))

> y <- log(UBS$rice2009)

> t(X)%*%X

[,1] [,2]

[1,] 54.0000 151.5818

[2,] 151.5818 440.4496

> t(X)%*%y

[,1]

[1,] 158.3701

[2,] 456.1961

> t(y)%*%y

[,1]

[1,] 481.9005

> sum(y)

[1] 158.3701

> qt(0.975,50:55)

[1] 2.008559 2.007584 2.006647 2.005746 2.004879 2.004045

> qf(0.95,1,50:55)

[1] 4.034310 4.030393 4.026631 4.023017 4.019541 4.016195

> qf(0.95,2,50:55)

[1] 3.182610 3.178799 3.175141 3.171626 3.168246 3.164993

(Hint: To alleviate rounding error, keep as many digits in internal calculations as possible.)

(a) [2 marks] A logarithmic transformation has been applied to both variables. Give two

reasons to justify this transformation.

(b) [3 marks] Calculate the least squares estimates of 8 .

[3 marks] Calculate the sample variance s2 .

[4 marks] In 2003, it cost 50 minutes of labour to buy 1kg of rice in the Republic of Linearmodelstan. Calculate (with 95% probability) an interval for the 2009 price of rice (in minutes of labour) in Linearmodelstan.

[3 marks] Test for model relevance at the 5% signiﬁcance level, using a corrected sum of squares.

[3 marks] It is claimed that, on average, the price of rice in 2003 is the same as the

price of rice in 2009, in terms of labour. This corresponds to a parameter estimate of

8 = ┌ 1(0) ┐ . Determine if this point lies within the joint 95% conﬁdence region for the

parameters.

Question 4 (12 marks)

Consider the full rank linear model y = X8 + e with p parameters. Now suppose that we transform the design variables x in a linear manner:

z） = a)）x), i = 1, . . . , p.

)=1

(Note that the x variables include the intercept term.) Now consider the linear model y = Z82 + e2 , which also has p parameters.

(a) [2 marks] Express the design matrix Z in terms of X, and state a condition under

which the second linear model is also full rank.

[3 marks] Calculate the least squares estimators for 82 from the second model, and express them in terms of b, the least squares estimators for 8 .

[2 marks] Consider a subject with design variables x* (for the ﬁrst model). Calculate a point estimate for the average response for this subject, using the second model, and express it in terms of b.

[3 marks] Calculate the sample variance for the second model, and express it in terms of the sample variance for the ﬁrst model.

[2 marks] Brieﬂy discuss the implications of the results you have derived above in the context of ﬁtting a linear model, with particular reference to variable standardisation.

Question 5 (12 marks)

Consider the general linear model, y = X8 + e. This model may be of full or less than full rank.

(a) [2 marks] Deﬁne the term BLUE (best linear unbiased estimator), and give an example

of when one might choose not to use the BLUE.

[2 marks] Describe how the parameters 8 of a linear model may be estimated by the method of maximum likelihood, and relate this to least squares estimation.

[2 marks] Deﬁne the Cook’s distance and explain its purpose.

[2 marks] Deﬁne estimability, and explain its signiﬁcance for a linear model.

[2 marks] Deﬁne interaction between a categorical and a continuous predictor, and explain how to model it.

[2 marks] Deﬁne single and double blinding, and describe their use in experimental design.

Question 6 (16 marks)

Data on 220 agricultural land sales in Minnesota over the period 2002–2011 were collected. The dataset contains the following variables:

❼ id: ID

❼ acrePrice: Sale price, in thousands of dollars per acre

❼ region: One of six major agricultural regions in Minnesota

❼ improvements: Percentage of property value in buildings

❼ year: Year of sale

❼ acres: Size of property

❼ tillable: Percentage of tillable area of the land

❼ financing: Type of ﬁnancing (title transfer or seller ﬁnanced)

❼ crpPct: Percentage of land in the US Conservation Reserve Program ❼ productivity: A score measuring the productivity of the land

We wish to model the selling price (acrePrice) in terms of the other variables (except id). The following R calculations are produced:

> ML <- read.csv(✬ML2.csv✬, header=T)

> interaction_model <- lm(acrePrice ~ (. - id)^2, data=ML)

> additive_model <- lm(acrePrice ~ . - id, data=ML)

> anova(additive_model, interaction_model)

Analysis of Variance Table

Model 1: acrePrice ~ (id + region + improvements + year + acres + tillable + financing + crpPct + productivity) - id

Model 2: acrePrice ~ ((id + region + improvements + year + acres + tillable + financing + crpPct + productivity) - id)^2

Res.Df RSS Df Sum of Sq F Pr(>F)

1 207 182.99

2 153 125.15 54 57.845 1.3096 0.1034

> selected_model <- step(additive_model)

Start: AIC=-14.52

acrePrice ~ (id + region + improvements + year + acres + tillable + financing + crpPct + productivity) - id

- financing

- improvements

- acres <none>

- productivity

- crpPct

- tillable

- region

- year

Sum of Sq

1.135 1.431 1.582

4.189 5.001 6.770 64.123 140.960

RSS AIC 184.13 -15.159 184.42 -14.806 184.58 -14.626 182.99 -14.519 187.18 -11.540 187.99 -10.588 189.76 -8.527 247.12 41.571 323.95 109.134

Step: AIC=-15.16

acrePrice ~ region + improvements + year + acres + tillable +

crpPct + productivity

2022-05-28

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言

R语言

Internet and World Wide Web

Principles of Programming Languages

sql

scheme

prolog

JavaScript

Haskell

essay

HDL

VBA

会计学(Accounting)

Rust

经济学（ Economics）

算法分析（Algorithm analysis）

MATLAB

Philosophy

Ethics

地理学（Geography）

Project management （管理学）

SysML

社会学（Sociology）

商业分析(Business Analysis)

市场营销学(Marketing)

人类学(Anthropology)

人文艺术(Arts and humanities)

电气工程（Electrical Engineering）

材料学（hylology）

生物科学（biological science）

哲学（Philosophy）

管理科学与工程类（Management science and Engineering）

工商管理（Business Administration）

数学（mathematics）

计算机（computer）

网络安全（Cyber Security）

统计学 Statistics

金融 Finance

经济与贸易 Economy and trade

Excel

Chemistry

LaTeX

OCaml

SPSS

Project

ASP

Stata

FORTRAN

Information system

SDLC

Basic

Digital Media

Biological

Android

ruby

HTML/CSS

Scala

PHP语言

MAST30025 Linear Statistical Models Semester 1 Assessment, 2020