Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


STATS 330

Semester 2, 2019

Statistical Modelling

STATS 330 Mid-Semester Test


Instructions:

1. Time allowed: 50 minutes.

2. Use Use the answe book provided.

3.  Give us your details on the front page of the answer book.

4. In general, show your working if required.

5. There are useful equations on the final page of this test.


The following data show the distribution of 1607 currently married and fecund women interviewed in the Fiji Fertility Survey, according to age, education, desire for more children and current use of contraception.  We are interested in which of these variables are related to contraceptive use, and how.

This data set came from the following article:

Little, R. J. A. (1978). Generalized Linear Models for Cross-Classified Data from the WFS. World Fertility Survey Technical Bulletins, Number 5.

The data set BC.df has the following variables:


age

The womans age: "<25"  "25-29", "30-39 and "40-49"

education

education level: "low" or "high"

wantsMore

want to have more children: "yes" or "no"

notUsing

the count of the number of women not using contraception

Using

the count of the number of women using contraception

In the analysis below, all of the explanatory variables are treated as a factors.

head(BC.df,  4)

##     age  education wantsMore notUsing using

##  1  <25 low yes 53 6

##  2  <25 low no 10 4

##  3  <25 high yes 212 52

##  4  <25 high no 50 10


Question 1

Consider the following model fitted to these data in R:

bin1.fit  <- glm(cbind(using, notUsing) ~ wantsMore + age + education,

family  =  "binomial" ,data  =  BC.df)

summary(bin1.fit)

##

##  Call:

##  glm(formula  =  cbind(using, notUsing)  ~ wantsMore  +  age  +  education, ##         family  =  "binomial",  data  =  BC.df)

##

##  Deviance  Residuals:

##       Min           1Q   Median           3Q         Max

##  -2.515    -0.938     0.241     0.982      1.733

##

##  Coefficients:

##                           Estimate  Std.  Error  z  value  Pr(>|z|)

##  (Intercept)       -0.808           0.159     -5.08    3.7e-07  ***

## wantsMoreyes     -0.833           0.117     -7.09    1.3e-12  ***

##  age25-29               0.389           0.176       2.21     0.0268  *

##  age30-39               0.909           0.165       5.52    3.4e-08  ***

##  age40-49               1.189           0.214       5.55    2.9e-08  ***

##  educationlow     -0.325           0.124     -2.62     0.0088  **

##                                                                                                                    ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  ' 1

##                                                                                                             ##  (Dispersion parameter  for binomial  family  taken  to be 1)

##

##         Null  deviance:  165.772    on  15   degrees  of  freedom

##  Residual  deviance:    29.917    on  10   degrees  of  freedom

##  AIC:  113.4

##

##  Number  of  Fisher  Scoring  iterations:  4

1-pchisq(bin1.fit$deviance ,bin1.fit$df.residual)

##  [1]  0.00088376

Write equations to describe the fitted model bin1.fit.

[5 marks]

logit(pi ) = β0 +β1wantsMorei +β2 age[25-29]i +β3 age[30-39]i +β4 age[40-49]i +β5 education[low]i

Yi Binomial(ni , pi ),

where wantmorei  = 1 if the ith  woman replied ”yes” to want more children, and 0 otherwise, age[25 - 29]i  = 1 if the ith  woman replied is aged between 25 and 29 to, and 0 otherwise,      age[30 - 39]i  = 1 if the ith  woman replied is aged between 25 and 29 to, and 0 otherwise,      age[40 - 49]i  = 1 if the ith  woman replied is aged between 25 and 29 to, and 0 otherwise,      education[low]i  = 1 if the ith  woman replied has a low education, 0, otherwise.


Question 2

Also consider the following model:

bin2.fit=glm(cbind(using, notUsing)  ~ wantsMore+age*education,

family  =  "binomial" ,  data  =  BC.df)

anova(bin1.fit , bin2.fit ,  test  =  "Chisq")

##  Analysis  of  Deviance  Table

##

## Model  1:  cbind(using, notUsing)  ~ wantsMore  +  age  +  education

## Model  2:  cbind(using, notUsing)  ~ wantsMore  +  age  *  education

##     Resid.  Df  Resid.  Dev  Df  Deviance  Pr(>Chi)

##  1               10             29.9

##  2                 7             23.1    3         6.77         0.08  .

##  ---

##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1


(a) What is the null hypothesis being tested by the anova() function?

[5 marks]

The null hypothesis is that there is no interaction between age and education.

(b) What do you conclude from this hypothesis test?

[5 marks]

We have, at best, weak evidence to suggest that an interaction exists between age and education. Given that this would require an additional three terms, it’s possibly not worth th effort.


Question 3

Consider the following R output:

df.residual(bin1.fit)

##  [1]  10

1  - pchisq(deviance(bin1.fit),  df.residual(bin1.fit))

##  [1]  0.00088376

deviance(bin2.fit)

##  [1]  23.151

df.residual(bin2.fit)

##  [1]  7

1  - pchisq(deviance(bin2.fit),  df.residual(bin2.fit))

##  [1]  0.0016043

(a) Based on the above output, do you think the model bin1.fit is appropriate?  Explain your answer.

[5 marks]

No, we have strong evidence to suggest the model is not appropriate. If the model is correct, we might expect the deviance to come from a chi-squared distribution with 10 degrees of freedom, but this is not plausible.

OR

We can’t really say from the information provided. We don’t know if the deviance is well approx- imated by a chi-square distribution unless we know whether or not the number of people in each group is sufficiently large and the proportion of people is sufficiently far away from 0 and 1.

(b) Based on the above output, do you think the model bin2.fit is appropriate?  Explain your answer.

[5 marks]

No, we have strong evidence to suggest the model is not appropriate. If the model is correct, we might expect the deviance to come from a chi-squared distribution with 7 degrees of freedom, but this is not plausible.

OR

We can’t really say from the information provided. We don’t know if the deviance is well approx- imated by a chi-square distribution unless we know whether or not the number of people in each group is sufficiently large and the proportion of people is sufficiently far away from 0 and 1.


Question 4

Suppose the researchers have theoretical reasons to believe that interaction effects are not so they consider the following models.

########  Quasi-Binomial  model  #####

quasibin1.fit  <- glm(cbind(using, notUsing)~ wantsMore + age + education, family = "quasibinomial" , data=BC.df)

summary(quasibin1.fit)

##

##  Call:

##  glm(formula  =  cbind(using, notUsing)  ~ wantsMore  +  age  +  education, ##         family  =  "quasibinomial",  data  =  BC.df)

##

##  Deviance  Residuals:

##       Min           1Q   Median           3Q         Max

##  -2.515    -0.938     0.241     0.982      1.733

##

##  Coefficients:

##                           Estimate  Std.  Error  t  value  Pr(>|t|)

##  (Intercept)       -0.808           0.267     -3.02     0.0128  *

## wantsMoreyes     -0.833           0.198     -4.22     0.0018  **

##  age25-29               0.389           0.296        1.32     0.2174

##  age30-39               0.909           0.277       3.28     0.0083  **

##  age40-49               1.189           0.361       3.30     0.0080  **

##  educationlow     -0.325           0.209     -1.56     0.1503

##  ---

##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1 ##

##  (Dispersion parameter  for  quasibinomial  family  taken  to be  2.8288) ##

##         Null  deviance:  165.772    on  15   degrees  of  freedom

##  Residual  deviance:    29.917    on  10   degrees  of  freedom

##  AIC:  NA

##

##  Number  of  Fisher  Scoring  iterations:  4

confint(quasibin1.fit)

##  Waiting  for profiling  to  be  done...

present,

##

2.5 %

97.5 %

## (Intercept)

-1.34697

-0.295060

## wantsMoreyes

-1.22190

-0.446586

## age25-29

-0.18652

0.976756

## age30-39

0.37550

1.464040

## age40-49

0.48727

1.904425

## educationlow

-0.73869

0.080112


plot(predict(quasibin1.fit),  residuals(quasibin1.fit),

xlab= "fitted" ,ylab  =  "Residuals")

abline(h  =  0 ,  lty  =  "dotted")

−2.0              −1.5              −1.0              −0.5               0.0

fitted

######## Beta Binomial  model  #####

#  null  model

BB.fit.null  <- vglm(cbind(using, notUsing)~ 1 , family = "betabinomial" , data=BC.df)

# fitted  model

BB.fit  <- vglm(cbind(using, notUsing)~ wantsMore + age + education, family = "betabinomial" , data=BC.df)

summaryvglm(BB.fit ,  HDEtest  =  F)

##

##  Call:

##  vglm(formula  =  cbind(using, notUsing)  ~ wantsMore  +  age  +  education, ##         family  =  "betabinomial",  data  =  BC.df)

##

##  Pearson  residuals:

##                                    Min          1Q Median       3Q   Max

##  logitlink(mu)    -2.042  -0.555    0.204  0.858  1.41

##  logitlink(rho)  -0.712  -0.649  -0.266  0.521  2.38

##

##  Coefficients:

##                            Estimate  Std.  Error  z  value  Pr(>|z|)

##  (Intercept):1     -0.926           0.228     -4.06    5.0e-05  ***

##  (Intercept):2     -4.750           0.797     -5.96    2.5e-09  ***

## wantsMoreyes       -0.847           0.165     -5.12    3.0e-07  ***

##  age25-29                 0.488           0.262        1.86       0.063  .

##  age30-39                 1.043           0.245       4.26    2.0e-05  ***

##  age40-49                 1.326           0.283       4.68    2.9e-06  ***

##  educationlow       -0.342           0.171     -2.01       0.045  *

##  ---

##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1 ##

##  Names  of  linear predictors:  logitlink(mu),  logitlink(rho)

##