闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 330

Semester 2, 2020

Statistical Modelling

STATS 330 Mid-Semester Test

Instructions:

1. Time allowed: 75 minutes.

2. Use the Rmarkdown ﬁle provided fro answering these questions.

3. Give us your details on the front page of this Rmarkdown ﬁle.

4. You may wish to use R to do some calculations.

5. In general, show your working if required.

6. Answers only require 2-3 short, informative, sentences.

7. There are useful equations on the ﬁnal page of this test.

Biologists are interested in thee autumn migration of the Wood lark (Lullula arborea - pictured) at Hanko bird observatory. The observatory is located in southwestern Finland on a peninsula known to be a bottleneck for terrestrial bird migration in Finland’s autumn. We use migration data from the autumn migration season (1 September–10 November) in 2007–2009. The data consist of daily counts of actively migrating birds during the ﬁrst four hours of daylight, starting from sunrise. Biologists know from past studies that the number migrating peaks in the middle of this season.

The data set WL.df has the following variables:

number

DOY

day

year

The number recorded for that day,

The day of the year 245=1st September, 315=10th November,

The day of the study 1=1st September, 71=10th November,

The year of study 7=2007, 8=2008, and 9=2009 .

1. Consider the following model and plots of these data. Do these plots support the opinions of past biologists? Comment brieﬂy.

[5 marks]

## gam model smoother on day

gam.fit <- gam(number~ s(day), family = "poisson" ,

data = WL.df)

# plot smoother

plot(gam.fit)

0 10 20 30 40 50 60 70

day

It the numbers migrate peak mid-way through the season form the smoothing used in the gam model. This curve may be approximated by using a quadratic relationship when we wish to ﬁt a parametric glm model.

2. Consider following analyses and models which attempt to capture the features of the gam model ﬁtted above. Which of these models best ﬁt these data? Show all calculations and

explain brieﬂy.

[5 marks]

Hint: pchisq.

## Poisson

WL.poiss=glm(number~day+I(day^2), family=poisson, data=WL.df)

slimSummary(WL.poiss)

## Call:

## glm(formula = number ~ day + I(day^2), family = poisson, data = WL.df) ##

## Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -0.693470 0.168888 -4.11 4e-05 ***

## day 0.173726 0.010319 16.84 <2e-16 ***

## I(day^2) -0.002631 0.000148 -17.73 <2e-16 ***

## ---

## (Dispersion parameter for poisson family taken to be 1)

## Null deviance: 2464.7 on 212 degrees of freedom

## Residual deviance: 1973.0 on 210 degrees of freedom

# negative binomial

WL.NB=glm.nb(number~day+I(day^2), data=WL.df)

slimSummary(WL.NB)

## Call:

## glm.nb(formula = number ~ day + I(day^2), data = WL.df, init.theta = 0.2679468613 ##

## Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -1.380974 0.480715 -2.87 0.0041 **

## day 0.212087 0.030490 6.96 3.5e-12 ***

## I(day^2) -0.003060 0.000416 -7.35 1.9e-13 ***

## ---

## (Dispersion parameter for Negative Binomial(0.2679) family taken to be 1) ##

## Null deviance: 226.99 on 212 degrees of freedom

## Residual deviance: 182.56 on 210 degrees of freedom

# Poisson fit

deviance(WL.poiss); df.residual(WL.poiss)

## [1] 1973

## [1] 210

1-pchisq(deviance(WL.poiss), df.residual(WL.poiss))

## [1] 0

# NB fit

deviance(WL.NB); df.residual(WL.NB)

## [1] 182.56

## [1] 210

1-pchisq(deviance(WL.NB), df.residual(WL.NB))

## [1] 0.91455

Clearly the model WL.NB has a deviance that is just due to random variation indicating a good ﬁt to these data. WL.poiss does not.

3. The following are the quantised residual plots for these two models, respectively. Explain what features of the Poisson model’s residuals that the negative binomial model addressed.

Explain brieﬂy.

[5 marks]

# Poisson model

plot(fitted(WL.poiss),qresiduals(WL.poiss))

0 2 4 6 8

fitted(WL.poiss)

# NB model

plot(fitted(WL.NB),qresiduals(WL.NB))

0 2 4 6 8 10

fitted(WL.NB)

The quantised residuals for WL.poiss appear to have increasing variability with increase ﬁt and appear to be right skew (and have greater overall variability). The quantised residuals of WL.NB look approximately Normal(0, 1) - which is ideal.

4. One of these biologists decided to ﬁt a quasi-Poisson model (called WL.qpois) and noticed that it gave the same parameter estimates as the model WL.poiss. However, their standard errors (and subsequent output) diﬀers.

Explain how the standard errors for the quasi-Poisson model WL.qpois diﬀer from the stan- dard errors for the Poison model WL.poiss in terms of the estimated dispersion parameter obtained from WL.qpois

[5 marks]

# quasi Poisson

WL.qpoiss=glm(number~day+I(day^2), family=quasipoisson, data=WL.df) slimSummary(WL.qpoiss)

## Call:

## glm(formula = number ~ day + I(day^2), family = quasipoisson, ##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -0.693470 0.644359 -1.08 0.28

## day 0.173726 0.039371 4.41 1.6e-05 ***

## I(day^2) -0.002631 0.000566 -4.65 5.9e-06 ***

## ---

## (Dispersion parameter for quasipoisson family taken to be 14.557) ##

## Null deviance: 2464.7 on 212 degrees of freedom

## Residual deviance: 1973.0 on 210 degrees of freedom

Hint: Estimated dispersion parameter=14.557.

The standard errors of the coeﬃcients of WL.poiss are larger by a factor of ,14.557 = 3.8153 than those of WL.poiss. The more appropriate model WL.NB accounts for the greater variability than expected from the model WL.poiss.

summary(WL.qpoiss)$coef

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -0.6934703 0.64435877 -1.0762 2.8306e-01

## day 0.1737262 0.03937094 4.4126 1.6325e-05

## I(day^2) -0.0026311 0.00056616 -4.6473 5.9345e-06

summary(WL.poiss)$coef

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -0.6934703 0.16888807 -4.1061 4.0241e-05

## day 0.1737262 0.01031922 16.8352 1.3472e-63

## I(day^2) -0.0026311 0.00014839 -17.7309 2.4205e-70

summary(WL.qpoiss)$coef[,2];summary(WL.poiss)$coef[,2]

## (Intercept) day I(day^2)

## 0.64435877 0.03937094 0.00056616

## (Intercept) day I(day^2)

## 0.16888807 0.01031922 0.00014839

summary(WL.qpoiss)$coef[,2]/summary(WL.poiss)$coef[,2]

## (Intercept)

## 3.8153

day

3.8153

I(day^2)

3.8153

sqrt(summary(WL.qpoiss)$dispersion)

## [1] 3.8153

Consider the following information:

The quadratic equation y = a + bx + cx2 for c < 0 has a maximum value xmax , say, at

5. Use the above information to determine from the model WL.NB when the time of peak migration occurs —in terms of days starting from the 1st of September.

[5 marks]

We have that log(µi ) = β0 + β1 day + β2 day2

βˆ1 0.21209

2βˆ2 2 × (_0.00306)

6. Calculate the Pearson residual when day=36 in 2009 using the model WL.NB.

[5 marks]

Hints:

WL.df$number[WL.df$day==36&WL.df$year==9]

## [1] 21

predict(WL.NB, newdata=data.frame(day=36))

## 1

## 2.2884

WL.NB$theta

## [1] 0.26795

= exp(2.28844) = 9.85955.

r() = + = 9.85955 + = 372.65829.

and he Pearson residual is

r = = 0.57709.

#check

id= (WL.df$day==36&WL.df$year==9)

residuals.glm(WL.NB, type= "pearson")[id]

## 178

## 0.57709

7. A mathematically inclined STATS 330 student realised that if they take the mid-point value of day (i.e. day=36) they can ﬁt a model with fewer parameters.

They ﬁtted the following model and calculated the log likelihood for this model:

WL.NB2=glm.nb(number~I((day-36)^2),data=WL.df)

logLik(WL.NB2)

## 'log Lik.' -429.23 (df=3)

The log-likelihood statistic for WL.NB is:

logLik(WL.NB)

## 'log Lik.' -428.6 (df=4)

Calculate the AIC for each of these two models and decide which model is best based on the AIC criteria.

[5 marks]

AIC = _2loglik + 2p where p is the number of parameters estimated (including dispersion parameters) , and so

AIC(WL.NB) = 2 × (428.6045) + 2 × 4 = 865.21.

AIC(WL.NB2) = 2 × (429.22605) + 2 × 3 = 864.45. So we choose WL.NB2 as it has a lower AIC (just).

# check via R

AIC(WL.NB)

## [1] 865.21

AIC(WL.NB2)

## [1] 864.45

A biologist wished to ascertain when when the number of migrating birds was zero and decided to ﬁt the following model. It was of particular importance to determine at what time periods there was less than a 50% chance of observing at least one bird.