Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


STATS 330

Semester 1, 2019

Statistical Modelling

STATS 330 Mid-Semester Test


Question 1

Consider the following model fitted to these data in R:

pois1.fit  <-  glm(Snapper  ~  Reserve  +  Season  + Year,  family  =  "poisson" ,

data  =  snapper.df)

summary(pois1.fit)

##

##  Call:

##  glm(formula  =  Snapper  ~  Reserve  +  Season  + Year,  family  =  "poisson", ##         data  =  snapper.df)

##

##  Deviance  Residuals:

##       Min           1Q   Median           3Q         Max

##  -4.566    -1.925    -0.391      1.193     4.447

##

##  Coefficients:

##                           Estimate  Std.  Error  z  value  Pr(>|z|)

##  (Intercept)       2.2369         0.0543     41.21     <2e-16  ***

##  ReserveHahei    -0.8235         0.0821    -10.03     <2e-16  ***

##  ReserveTAWH     -0.7181         0.0860     -8.35     <2e-16  ***

##  SeasonSpring    -0.7968         0.0668    -11.93     <2e-16  ***

## Year98-99           0.1072         0.0619        1.73       0.084  .

##                                                                                                                    ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1

##                                                                                                            ##  (Dispersion parameter  for poisson  family  taken  to be  1)

##

##         Null  deviance:  1048.12    on  203   degrees  of  freedom

##  Residual  deviance:    740.46    on  199   degrees  of  freedom

##  AIC:  1304

##

##  Number  of  Fisher  Scoring  iterations:  5

confint(pois1.fit)

##  Waiting  for profiling  to  be  done...

##                                  2.5  %     97.5  %

##  (Intercept)     2.129157    2.34196

##  ReserveHahei  -0.986987  -0.66492

##  ReserveTAWH    -0.889776  -0.55226

##  SeasonSpring  -0.928897  -0.66690

## Year98-99       -0.014105    0.22872


(a) Write equations to describe the fitted model pois1.fit.

[5 marks]

(b) Interpret the effect of season based on the model pois1.fit.

[10 marks]

 

Question 2

Also consider the following model:

pois2.fit  <-  glm(Snapper  ~  Reserve  *  Season  * Year,  family  =  "poisson" , data  =  snapper.df)

anova(pois1.fit , pois2.fit ,  test  =  "Chisq")

##  Analysis  of  Deviance  Table

##

## Model  1:  Snapper  ~  Reserve  +  Season  + Year

## Model  2:  Snapper  ~  Reserve  *  Season  * Year

##     Resid.  Df  Resid.  Dev  Df  Deviance  Pr(>Chi)

##  1             199               740

##  2             192               702    7         38.1    2.9e-06  ***

##  ---

##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1


(a) What is the null hypothesis being tested by the anova() function?

[10 marks]

(b) What do you conclude from this hypothesis test?

[5 marks]


Question 3

Consider the following R output:

deviance(pois1.fit)

##  [1]  740.46

df.residual(pois1.fit)

##  [1]  199

1  - pchisq(deviance(pois1.fit),  df.residual(pois1.fit))

##  [1]  0

deviance(pois2.fit)

##  [1]  702.38

df.residual(pois2.fit)

##  [1]  192

1  - pchisq(deviance(pois2.fit),  df.residual(pois2.fit))

##  [1]  0

(a) Based on the above output, do you think the model pois1.fit is approrpiate? Explain your answer.

[5 marks]

(b) Based on the above output, do you think the model pois2.fit is appropriate? Explain your answer.

[5 marks]


Question 4

Consider the following models. Note that the summary() output has been truncated to save space.

quasipois.fit  <-  glm(Snapper  ~  Reserve  +  Season  + Year,  family  =  "quasipoisson") summary(quasipois.fit)

##  Deviance  Residuals:

##         Min             1Q     Median             3Q           Max

##  -4.5659    -1.9249    -0.3913      1.1928     4.4473

##

##  Coefficients:

##                           Estimate  Std.  Error  t  value  Pr(>|t|)

##  (Intercept)     2.23693       0.09779    22.874    <  2e-16  ***

##  ReserveHahei  -0.82349       0.14793    -5.567  8.32e-08  ***

##  ReserveTawh    -0.71813       0.15502    -4.633  6.52e-06  ***

##  SeasonSpring  -0.79681       0.12037    -6.620  3.27e-10  ***

## Year98-99         0.10718       0.11157     0.961       0.338

##  ---

##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1 ##

##  (Dispersion parameter  for  quasipoisson  family  taken  to be  3.245656) ##  ...

nb.fit  <-  glm.nb(Snapper  ~  Reserve  +  Season  + Year)

summary(nb.fit)

##  Deviance  Residuals:

##         Min             1Q     Median             3Q           Max

##  -2.5791    -1.1752    -0.2334     0.5523      1.9450

##

##  Coefficients:

##                           Estimate  Std.  Error  z  value  Pr(>|z|)

##  (Intercept)       2.3240         0.1286    18.072    <  2e-16  ***

##  ReserveHahei    -0.9775         0.1603    -6.099  1.07e-09  ***

##  ReserveTawh     -0.7884         0.1693    -4.658  3.20e-06  ***

##  SeasonSpring    -0.9199         0.1338    -6.874  6.25e-12  ***

## Year98-99           0.1336         0.1327      1.006       0.314

##                                                                                                                    ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1

##                                                                                                                                              ##  (Dispersion parameter  for  Negative  Binomial(1.5522)  family  taken  to be  1)

##  ...


##

##

##


Theta:    1.552

Std.  Err.:    0.239



plot(predict(quasipois.fit),  residuals(quasipois.fit ,  type  =  "deviance"), ylab  =  "Residual")

abline(h  =  0 ,  lty  =  "dotted")

1.0

1.5

2.0

predict(quasipois.fit)

plot(predict(nb.fit),  residuals(nb.fit ,  type  =  "deviance"),  ylab  =  "Residual") abline(h  =  0 ,  lty  =  "dotted")

0.5                   1.0                   1.5                   2.0                   2.5

predict(nb.fit)

plot(predict(nb.fit),  qresiduals(nb.fit),  ylab  =  "Residual")

abline(h  =  0 ,  lty  =  "dotted")

0.5                   1.0                   1.5                   2.0                   2.5

predict(nb.fit)


Based on the output above, which model do you prefer out of quasipois.fit and nb.fit? Explain your answer.

[10 marks]


Question 5

The second observation in the data set has the following observed values:

snapper.df[2,  ]

##     Reserve  Season   Year  Snapper

##  2     Hahei  Spring  97-98             4

It has the following fitted expected values (2 ) under models pois1.fit and nb.fit:

fitted(pois1.fit)[2]

##           2

##  1.8527

fitted(nb.fit)[2]

##           2

##  1.5321

(a) What is the estimated variance of the response for this observation under the following

models? Calculate your answers to two decimal places, and show your working.

[15 marks]

(i) Model pois1.fit (from Question 1).

(ii) Model quasipois.fit (from Question 4).

(iii) Model nb.fit (from Question 4).



(b) What is this observation’s Pearson residual under the following models?   Calculate your

answers to two decimal placees, and show your working.

[10 marks]

(i) Model pois1.fit (from Question 1).

(ii) Model nb.fit (from Question 4).


 

Question 6

Many of the models fitted above use the following simple additive equation within the model fitting function:

Snapper  ~  Reserve  +  Season  + Year

 

However, this does not account for any interactions that may exist between the variables.

Describe a strategy to find the model with the set of main effects and interaction effects that are best supported by the data.

[15 marks]


Question 7

A fisheries expert believes that snapper prefer one of the sites over the others in autumn, but prefer a different site in spring.  Also, they believe that there were more snapper around for the 98–99 year than the 97–98 year, and that this effect was the same for all seasons and sites.

Which explanatory terms could you include in a model so that it is capable of estimating the effects that the fisheries expert believes to exist?

You can either provide your answer as a description in words or an R equation, for example, in the form

y  ~  x  *  z

[10 marks]


Useful equations

If yi  ~ Normal(ui , 72 ), then

E(yi ) = ui ,

Var(yi ) = 72 .

If yi  ~ Binomial(ni , pi ), then

E(yi ) = nipi ,

Var(yi ) = nipi (1 _ pi ).

If yi  ~ Poisson(ui ), then

E(yi ) = ui ,

Var(yi ) = ui .

If yi  ~ Negative Binomial(ui , 9), then

E(yi ) = ui ,

Var(yi ) = ui +