闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 2 Assessment, 2017

MAST30027 Modern Applied Statistics

Question 1 (10 marks) Let X1 , . . . , Xn be independent samples from a Pareto distribution Par(1, κ) with pdf f (x|κ) = κ(1 + x)-K-1 , x > 0.

(a) What is the log-likelihood for this example?

(b) What is the Fisher information for this example?

Question 2 (8 marks) The gamma distribution with shape parameter ν > 0 and rate parameter λ > 0 has the probability density function

f (x; ν, λ) = xv -1 e-入北 , x ≥ 0.

The mean is 入(v) and the variance is 入2(v) .

(a) Show that the gamma distribution is an exponential family.

(b) Obtain the canonical link. Show your work.

Question 3 (22 marks) The wavesolder data set has 48 observations of y, the number of defects, and ﬁve predictor variables, prebake, flux, speed, cooling, and temp. The data is taken from Condra, Lloyd, Reliability Improvement with Design of Experiment, CRC Press, 2001.

Examine the R code and output below, and then answer the questions that follow.

Firstly we need to combine the three replicates into a single data set, and then have a look at the data.

> rm(list=ls())

> library(faraway)

> data(wavesolder)

> y <- c(wavesolder$y1, wavesolder$y2, wavesolder$y3)

> wavesolder <- rbind(wavesolder, wavesolder, wavesolder)

> wavesolder <- wavesolder[- (1:3)]

> wavesolder$y <- y

> par(mfrow=c(2,3), mar=c(4,4,1,1))

> plot(y ~ prebake, wavesolder)

> plot(y ~ flux, wavesolder)

> plot(y ~ speed, wavesolder)

> plot(y ~ cooling, wavesolder)

> plot(y ~ temp, wavesolder)

1 2

prebake

1 2

cooling

flux

temp

1 2

speed

> modelA <- glm(y ~ prebake + flux + speed + temp,

+ family=poisson, data=wavesolder)

> summary(modelA)

Call:

glm(formula = y ~ prebake + flux + speed + temp, family = poisson,

data = wavesolder)

Deviance Residuals:

Min 1Q Median 3Q Max

-8 .0503 -1 .9044 -0 .5489 1 .8995 12 .5918

Coefficients:

Estimate Std . Error z value Pr(>|z|)

(Intercept) 2 .80541 0 .06948 40 .38 <2e-16 ***

prebake2 0 .67287 0 .05374 12 .52 <2e-16 ***

flux2 -0 .52878 0 .05262 -10 .05 <2e-16 ***

speed2 1 .23048 0 .06076 20 .25 <2e-16 ***

temp2 -0 .69315 0 .05392 -12 .86 <2e-16 ***

---

Signif . codes: 0 *** 0 .001 ** 0 .01 * 0 .05 . 0 .1 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1450 .52 on 47 degrees of freedom

Residual deviance: 513 .75 on 43 degrees of freedom

AIC: 754 .49

Number of Fisher Scoring iterations: 5

> modelB <- glm(y ~ prebake + flux + speed + cooling + temp,

+ family=poisson, data=wavesolder)

> summary(modelB)

Call:

glm(formula = y ~ prebake + flux + speed + cooling + temp, family = poisson, data = wavesolder)

Deviance Residuals:

Min 1Q Median 3Q Max

-7 .7230 -2 .0135 -0 .2761 1 .5991 13 .2687

Coefficients:

Estimate Std . Error z value Pr(>|z|)

(Intercept) 2 .88947 0 .07576 38 .142 < 2e-16 ***

prebake2 0 .64801 0 .05450 11 .891 < 2e-16 ***

flux2 -0 .52878 0 .05262 -10 .049 < 2e-16 ***

speed2 1 .21614 0 .06098 19 .943 < 2e-16 ***

cooling2 -0 .14222 0 .05279 -2 .694 0 .00706 **

temp2 -0 .66902 0 .05463 -12 .247 < 2e-16 ***

---

Signif . codes: 0 *** 0 .001 ** 0 .01 * 0 .05 . 0 .1 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1450 .52 on 47 degrees of freedom

Residual deviance: 506 .48 on 42 degrees of freedom

AIC: 749 .22

Number of Fisher Scoring iterations: 5

> anova(modelA, modelB, test="Chisq")

Analysis of Deviance Table

Model 1: y ~ prebake + flux + speed + temp

Model 2: y ~ prebake + flux + speed + cooling + temp

Resid . Df Resid . Dev Df Deviance Pr(>Chi)

1 43 513.75

2 42 506 .48 1 7 .2729 0 .007 **

---

Signif . codes: 0 *** 0 .001 ** 0 .01 * 0 .05 . 0 .1 1

> (phi <- sum(residuals(modelB, type="pearson")^2/modelB$df .residual))

[1] 13 .93209

> modelC <- glm(y ~ prebake + flux + speed + temp, family=quasipoisson, data=wavesolder) > modelD <- glm(y ~ prebake + flux + speed + cooling + temp, family=quasipoisson, + data=wavesolder)

> anova(modelC, modelD, test="F")

Analysis of Deviance Table

Model 1: y ~ prebake + flux + speed + temp

Model 2: y ~ prebake + flux + speed + cooling + temp

Resid . Df Resid . Dev Df Deviance F Pr(>F)

1 43 513.75

2 42 506.48 1 7.2729 0.522 0.474

(a) For modelA, assuming Poisson responses, what is the log-likelihood of the ﬁtted model, and the log-likelihood of the full (saturated) model?

(b) Assuming Poisson responses, which is better, modelA or modelB? Give two (quantitative) reasons for your answer.

(c) Give an estimate for the expected number of defects, for prebake = 1, flux = 2, speed = 2, cooling = 1, temp = 1, under modelB.

(d) Give a (quantitative) reason why modelB may suﬀer from overdispersion. (e) Brieﬂy the diﬀerence between a Poisson and quasi-Poisson model.

(f) Give the std. error for cooling2 in the case where we allow for overdispersion. (g) Allowing for overdispersion, do you prefer modelC or modelD, and why?

(h) What formula has been used to calculate the F statistic in the second analysis of deviance? What are the degrees of freedom for the F statistic?

Question 4 (14 marks) The following three-way table refers to results of a case-control study about eﬀects of cigarette smoking and coﬀee drinking on myocardial infarction (MI) or heart attack for a sample of men under 55 years of age.

Cigarettes per Day Cups Coﬀee 0 1-24 25-34 ≥ 35 per Day Cases Controls Cases Controls Cases Controls Cases Controls
0	66	123	30 52	15 12	36	13
1-2	141	179	59 45	53 22	69	25
3-4	113	106	63 65	55 16	119	30
≥ 5	129	80	102 58	118 44	373	85

Eight log-linear models with Poisson error have been ﬁtted, with the residual deviances

given in the following table.

Model

Residual

deviance

coffee + cigar + MI

coffee + cigar*MI

cigar + coffee*MI

MI + coffee*cigar coffee*cigar + coffee*MI coffee*cigar + cigar*MI coffee*MI + cigar*MI coffee*cigar + coffee*MI

cigar*MI

607.25

394.43

484.70

271.40

148.81

58.55

271.88

11.17

You will ﬁnd the following chi-squared percentage points useful for problems (c) and (d).

> qchisq(0 .95, df=5:10)

[1] 11 .07050 12 .59159 14 .06714 > qchisq(0 .95, df=11:15)

[1] 19 .67514 21 .02607 22 .36203 > qchisq(0 .95, df=16:20)

[1] 26 .29623 27 .58711 28 .86930

15 .50731 16 .91898 18 .30704

23 .68479 24 .99579

30 .14353 31 .41043

(a) What are the residual degrees of freedom (d.f.) for each of the three models: coffee +

cigar + MI, cigar + coffee*MI, and coffee*cigar + cigar*MI?

(b) Give an interpretation for each of the following models.

(i) coffee + cigar + MI

(ii) MI + coffee*cigar

(iii) coffee*cigar + coffee*MI

(d) Test the hypothesis that the association between MI and cigar is the same for all coffee levels. That is, test that there is no three-way interaction (at the 95% level).

Question 5 (14 marks)

(a) Here is some R code for simulating a discrete random variable Y . What is the probability mass function (pmf) of Y , i.e., P (Y = y) for y ≥ 2?

Y .sim <- function() {

U <- runif(1)

Y <- 2

Y <- Y + 1

}

return(Y)

}

(b) Let a random number X be generated by the following algorithm:

1o Generate U from Unif(0, 1) and V from Unif(0, 1) independently. 2o If U + V < 1, then X = 1 _ U ; otherwise, go to 1o .

What is the probability density function of X, i.e., f(x) for 0 < x < 1?

Question 6 (18 marks) Consider a random sample X from a Bernoulli distribution with pdf f(x|θ) = θz (1 _ θ)1-z ; x = 0, 1. Let the prior distribution for θ be Uniform(0, 1), i.e., p(θ) = 1 for 0 < θ < 1. We use the squared error loss function.

(a) Find the posterior distribution of θ .

(b) Find the Bayes estimator of θ .

(d) Find the Bayes risk of the Bayes estimator of θ .

Question 7 (20 marks) We assume that x1 , . . . , xn1 and y1 , . . . , yn2 are independently normally distributed as follows.

xi ~ N (µ1 , σ 2 ), i = 1, . . . , n1

yi ~ N (µ2 , σ 2 ), i = 1, . . . , n2

We impose the following prior distributions on µ 1 , µ2 and τ = 1/σ2 .

p(µ1 ) x 1

p(µ2 ) x 1

p(τ ) x 1/τ

(a) Among µ 1 , µ2 and τ , which parameter(s) have an improper prior?

µ 1 |x, y, µ2 , τ

µ2 |x, y, µ1 , τ

τ |x, y, µ1 , µ2

Hence give the (conditional) distributions of these variables, including their parameters.

(d) How would you check for convergence of the Gibbs sampler? Provide both informal (graphical/visual checks) and formal methods. Also, brieﬂy provide details of methods.

Question 8 (14 marks) Brieﬂy describe an algorithm to simulate samples from the posterior predictive distribution:

p(y˜|y) = p(y˜|θ)p(θ|y)dθ .

How would you estimate the mean of the posterior predictive distribution using the simulated samples?