闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 2 Assessment, 2020

MAST30027 Modern Applied Statistics

Question 1 (7 marks)

Let X1 , ··· ,Xn be independent samples from a Normal distribution N(1, ) with pdf

r e −⌧ (x1)2 .

(a) What is the log-likelihood for this example?

(b) What is the Fisher information for this example?

Question 2 (9 marks)

The dvisits data in the faraway package comes from the Australian Health Survey of 1977–78 and consists of 5190 observations on single adults, where young and old have been oversampled. Here, we consider doctorco as a response and sex, age, income, levyplus, freepoor, freerepa, illness, actdays as predictor variables. The description of each variable is as follows.

• doctorco: number of consultations with a doctor or specialist in the past 2 weeks

• sex: 1 if female, 0 if male

• age: age in years divided by 100

• income: annual income in Australian dollars divided by 1000

• levyplus: 1 if covered by a private health insurance fund for private patients in a public hospital (with doctor of choice), 0 otherwise

• freepoor: 1 if covered by government because of low income, recent immigrant, or unem- ployed, 0 otherwise

• freerepa: 1 if covered by government because of old-age or disability pension, or because of invalid veteran or family of deceased veteran, 0 otherwise

• illness: number of illnesses in past 2 weeks, with 5 or more coded as 5

• actdays: number of days of reduced activity in past two weeks due to illness or injury

Examine the R code and output below, and then answer the questions that follow.

> library(faraway)

> data(dvisits)

> modelA <- glm(doctorco ~ sex + age + income + levyplus + freepoor

+ freerepa + illness + actdays,

family=quasipoisson, data=dvisits)

glm(formula = doctorco ~ sex + age + income + levyplus + freepoor +

freerepa + illness + actdays, family = quasipoisson, data = dvisits)

Deviance Residuals:

Min 1Q Median 3Q Max

-2 .7696 -0 .6865 -0 .5773 -0 .4906 5 .5745

Coefficients:

Estimate Std . Error t value Pr(>|t |)

(Intercept) -2 .055666 0 .115808 -17 .751 <2e-16 ***

sex 0 .163442 0 .064454 2 .536 0 .0112 *

age 0 .296311 0 .186496 1 .589 0 .1122

income -0 .195493 0 .098511 -1 .984 0 .0473 *

levyplus 0 .143743 0 .082153 1 .750 0 .0802 .

freepoor -0 .404611 0 .206938 -1 .955 0 .0506 .

freerepa 0 .118603 0 .105656 1 .123 0 .2617

illness 0 .211644 0 .019482 10 .864 <2e-16 ***

actdays 0 .133576 0 .005264 25 .377 <2e-16 ***

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05‘ . ’0 .1‘ ’1

(Dispersion parameter for quasipoisson family taken to be 1 .328231)

Null deviance: 5634 .8

Residual deviance: 4394 .3

AIC: NA

on 5189

on 5181

degrees of freedom

Number of Fisher Scoring iterations: 6

> modelB <- glm(doctorco ~ sex + age + income,

+ family=quasipoisson, data=dvisits)

> anova(modelB, modelA, test="F")

Analysis of Deviance Table

Model 1: doctorco ~ sex + age + income

Model 2: doctorco ~ sex + age + income + levyplus + freepoor + freerepa +

illness + actdays

Resid . Df Resid . Dev Df Deviance F Pr(>F)

1 5186 5434.9

2 5181 4394 .3 5 1040 .5 156 .68 < 2 .2e-16 ***

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05‘ . ’0 .1‘ ’1

> modelC <- glm(doctorco ~ sex + age + income + levyplus + freepoor

+ freerepa + illness + actdays,

family=poisson, data=dvisits)

> modelD <- glm(doctorco ~ sex + age + income,

+ family=poisson, data=dvisits)

> summary(modelD)

Call:

glm(formula = doctorco ~ sex + age + income, family = poisson,

data = dvisits)

Deviance Residuals:

Min

-1 .0350

1Q -0 .8031

Median

-0 .6749

-0 .6069

Max

6 .3695

Coefficients:

Estimate Std . Error z value Pr(>|z|)

(Intercept) -1 .71473 0 .09118 -18 .805 < 2e-16 ***

sex 0 .21565 0 .05589 3 .859 0 .000114 ***

age 1 .23798 0 .13013 9 .514 < 2e-16 ***

income -0 .27726 0 .07969 -3 .479 0 .000502 ***

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05‘ . ’0 .1‘ ’1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 5634 .8

Residual deviance: 5434 .9

AIC: 7774 .4

on 5189

on 5186

degrees of freedom

Number of Fisher Scoring iterations: 6

> (phiC <- sum(residuals(modelC, type="pearson")^2/modelC$df .residual))

[1] 1 .328231

> (phiD <- sum(residuals(modelD, type="pearson")^2/modelD$df .residual))

[1] 2 .027811

Do you prefer modelA or modelB, and why?

Show how the F-statistic has been calculated in the analysis of deviance. What are the degrees of freedom for the F statistic?

Give the estimator for the coeﬃcient of illness and its standard error for modelC.

For modelD, what is the log-likelihood of the ﬁtted model, and the log-likelihood of the

full (saturated) model?

Question 3 (10 marks)

We assume that the observed data, X1 = 1,X2 = 2,X3 = 4 follow a mixture of two Poisson distributions. Speciﬁcally, for i = 1, 2, 3,

Zi ⇠ categorical (⇡ , 1 − ⇡),

Xi|Zi = 1 ⇠ Poisson(λ1 = 1.2) and Xi|Zi = 2 ⇠ Poisson(λ2 = 2.5), where the Poisson distribution has the probability mass function

λ北 e −λ

Assume that we derived and implemented the EM algorithm to obtain the MLE of the parameter ⇡ .

(a) Assume we ran the EM algorithm two times with di↵erent initial values. The following

table shows two estimates returned by di↵erent runs of the EM algorithm. Which estimate should we use as the MLE of the parameter ⇡? Why?

initial value

estimate for ⇡

run 1 run 2

0.1

0.9

0.3

0.4

(b)

If P(Zi = 1|Xi) > 0.5, we assign a sample Xi to cluster 1, and cluster 2 otherwise. Using the MLE from the part (a), compute P(Z2 = 1|X2 = 2) and P(Z2 = 2|X2 = 2). Which cluster is X2 = 2 assigned to?

Question 4 (18 marks)

Model: We assume that 北 and y are independent and follow normal distributions: 北 ⇠ N(µ1 , 12) , y ⇠ N(µ2 , 12).

Prior: We impose the following bivariate normal prior for the mean parameters:

✓ µ(µ)2(1) ◆ ⇠ N(µ, ⌃) with µ = ✓ 0(0) ◆ and ⌃ = ✓ ◆ .

Recall that x = ✓ 北(北)2(1) ◆ ⇠ N(µ, ⌃) with µ = ✓ µ(µ)2(1) ◆ and ⌃ = ✓ 2σ1σ12 σ 122σ2 ◆ , i↵ x has joint

density

fµ , ⌃(x) = exp ✓ − (x − µ)T ⌃ − 1 (x − µ)◆ .

(a) We wish to perform posterior inference using the Gibbs sampling. Derive the conditional distribution

p(µ1 |µ2,北,y).

If it is a known distribution, identify the distribution name and its parameters.

(b) We wish to perform posterior inference using the Metropolis-Hastings algorithm. For the current values of the parameters (µ1(c),µ2(c)), we propose new values (µ1(n),µ2(n)) as follows: µ1(n) ⇠ N(0, 12) and µ2(n) ⇠ N(µ2(c) , 12). Compute the acceptance probability when (µ1(c),µ2(c)) =

(2, 0), (µ1(n),µ2(n)) = (3, 1),北 = 1,y = 0.

(c)

We wish to perform posterior inference using variational inference with the mean-ﬁeld variational family where q(µ1 ,µ2 ) = q1 (µ1 )q2 (µ2 ) and use the CAVI algorithm for opti- misation. The CAVI algorithm iteratively optimises each factor as follows while holding the other factor ﬁxed:

q 1(*)(µ1 ) / exp{Eµ2 [logp(µ1 ,µ2 ,x,y)]},

q2(*)(µ2 ) / exp{Eµ1 [logp(µ1 ,µ2 ,x,y)]},

where the expectations Eµ2 and Eµ 1 are taken with respect to q2(*)(µ2 ) and q1(*)(µ1 ), respec- tively. Derive q1(*)(µ1 ) and q2(*)(µ2 ), and identify the corresponding distribution names and

their parameters.

2022-11-01

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple