STATS 330 Mid-Semester Test Semester 2, 2019
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 330
Semester 2, 2019
Statistical Modelling
STATS 330 Mid-Semester Test
Instructions:
1. Time allowed: 50 minutes.
2. Use Use the answe book provided.
3. Give us your details on the front page of the answer book.
4. In general, show your working if required.
5. There are useful equations on the final page of this test.
The following data show the distribution of 1607 currently married and fecund women interviewed in the Fiji Fertility Survey, according to age, education, desire for more children and current use of contraception. We are interested in which of these variables are related to contraceptive use, and how.
This data set came from the following article:
Little, R. J. A. (1978). Generalized Linear Models for Cross-Classified Data from the WFS. World Fertility Survey Technical Bulletins, Number 5.
The data set BC.df has the following variables:
age |
The woman’s age: "<25" "25-29", "30-39 and "40-49" |
education |
education level: "low" or "high" |
wantsMore |
want to have more children: "yes" or "no" |
notUsing |
the count of the number of women not using contraception |
Using |
the count of the number of women using contraception |
In the analysis below, all of the explanatory variables are treated as a factors.
head(BC.df, 4)
## age education wantsMore notUsing using
## 1 <25 low yes 53 6
## 2 <25 low no 10 4
## 3 <25 high yes 212 52
## 4 <25 high no 50 10
Question 1
Consider the following model fitted to these data in R:
bin1.fit <- glm(cbind(using, notUsing) ~ wantsMore + age + education,
family = "binomial" ,data = BC.df)
summary(bin1.fit)
##
## Call:
## glm(formula = cbind(using, notUsing) ~ wantsMore + age + education, ## family = "binomial", data = BC.df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.515 -0.938 0.241 0.982 1.733
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.808 0.159 -5.08 3.7e-07 ***
## wantsMoreyes -0.833 0.117 -7.09 1.3e-12 ***
## age25-29 0.389 0.176 2.21 0.0268 *
## age30-39 0.909 0.165 5.52 3.4e-08 ***
## age40-49 1.189 0.214 5.55 2.9e-08 ***
## educationlow -0.325 0.124 -2.62 0.0088 **
## ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 29.917 on 10 degrees of freedom
## AIC: 113.4
##
## Number of Fisher Scoring iterations: 4
1-pchisq(bin1.fit$deviance ,bin1.fit$df.residual)
## [1] 0.00088376
Write equations to describe the fitted model bin1.fit.
[5 marks]
logit(pi ) = β0 +β1wantsMorei +β2 age[25-29]i +β3 age[30-39]i +β4 age[40-49]i +β5 education[low]i
Yi ~ Binomial(ni , pi ),
where wantmorei = 1 if the ith woman replied ”yes” to want more children, and 0 otherwise, age[25 - 29]i = 1 if the ith woman replied is aged between 25 and 29 to, and 0 otherwise, age[30 - 39]i = 1 if the ith woman replied is aged between 25 and 29 to, and 0 otherwise, age[40 - 49]i = 1 if the ith woman replied is aged between 25 and 29 to, and 0 otherwise, education[low]i = 1 if the ith woman replied has a low education, 0, otherwise.
Question 2
Also consider the following model:
bin2.fit=glm(cbind(using, notUsing) ~ wantsMore+age*education,
family = "binomial" , data = BC.df)
anova(bin1.fit , bin2.fit , test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: cbind(using, notUsing) ~ wantsMore + age + education
## Model 2: cbind(using, notUsing) ~ wantsMore + age * education
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 10 29.9
## 2 7 23.1 3 6.77 0.08 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(a) What is the null hypothesis being tested by the anova() function?
[5 marks]
The null hypothesis is that there is no interaction between age and education.
(b) What do you conclude from this hypothesis test?
[5 marks]
We have, at best, weak evidence to suggest that an interaction exists between age and education. Given that this would require an additional three terms, it’s possibly not worth th effort.
Question 3
Consider the following R output:
df.residual(bin1.fit)
## [1] 10
1 - pchisq(deviance(bin1.fit), df.residual(bin1.fit))
## [1] 0.00088376
deviance(bin2.fit)
## [1] 23.151
df.residual(bin2.fit)
## [1] 7
1 - pchisq(deviance(bin2.fit), df.residual(bin2.fit))
## [1] 0.0016043
(a) Based on the above output, do you think the model bin1.fit is appropriate? Explain your answer.
[5 marks]
No, we have strong evidence to suggest the model is not appropriate. If the model is correct, we might expect the deviance to come from a chi-squared distribution with 10 degrees of freedom, but this is not plausible.
OR
We can’t really say from the information provided. We don’t know if the deviance is well approx- imated by a chi-square distribution unless we know whether or not the number of people in each group is sufficiently large and the proportion of people is sufficiently far away from 0 and 1.
(b) Based on the above output, do you think the model bin2.fit is appropriate? Explain your answer.
[5 marks]
No, we have strong evidence to suggest the model is not appropriate. If the model is correct, we might expect the deviance to come from a chi-squared distribution with 7 degrees of freedom, but this is not plausible.
OR
We can’t really say from the information provided. We don’t know if the deviance is well approx- imated by a chi-square distribution unless we know whether or not the number of people in each group is sufficiently large and the proportion of people is sufficiently far away from 0 and 1.
Question 4
Suppose the researchers have theoretical reasons to believe that interaction effects are not so they consider the following models.
######## Quasi-Binomial model #####
quasibin1.fit <- glm(cbind(using, notUsing)~ wantsMore + age + education, family = "quasibinomial" , data=BC.df)
summary(quasibin1.fit)
##
## Call:
## glm(formula = cbind(using, notUsing) ~ wantsMore + age + education, ## family = "quasibinomial", data = BC.df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.515 -0.938 0.241 0.982 1.733
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.808 0.267 -3.02 0.0128 *
## wantsMoreyes -0.833 0.198 -4.22 0.0018 **
## age25-29 0.389 0.296 1.32 0.2174
## age30-39 0.909 0.277 3.28 0.0083 **
## age40-49 1.189 0.361 3.30 0.0080 **
## educationlow -0.325 0.209 -1.56 0.1503
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##
## (Dispersion parameter for quasibinomial family taken to be 2.8288) ##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 29.917 on 10 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
confint(quasibin1.fit)
## Waiting for profiling to be done...
present,
## |
2.5 % |
97.5 % |
## (Intercept) |
-1.34697 |
-0.295060 |
## wantsMoreyes |
-1.22190 |
-0.446586 |
## age25-29 |
-0.18652 |
0.976756 |
## age30-39 |
0.37550 |
1.464040 |
## age40-49 |
0.48727 |
1.904425 |
## educationlow |
-0.73869 |
0.080112 |
plot(predict(quasibin1.fit), residuals(quasibin1.fit),
xlab= "fitted" ,ylab = "Residuals")
abline(h = 0 , lty = "dotted")
−2.0 −1.5 −1.0 −0.5 0.0
fitted
######## Beta Binomial model #####
# null model
BB.fit.null <- vglm(cbind(using, notUsing)~ 1 , family = "betabinomial" , data=BC.df)
# fitted model
BB.fit <- vglm(cbind(using, notUsing)~ wantsMore + age + education, family = "betabinomial" , data=BC.df)
summaryvglm(BB.fit , HDEtest = F)
##
## Call:
## vglm(formula = cbind(using, notUsing) ~ wantsMore + age + education, ## family = "betabinomial", data = BC.df)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## logitlink(mu) -2.042 -0.555 0.204 0.858 1.41
## logitlink(rho) -0.712 -0.649 -0.266 0.521 2.38
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -0.926 0.228 -4.06 5.0e-05 ***
## (Intercept):2 -4.750 0.797 -5.96 2.5e-09 ***
## wantsMoreyes -0.847 0.165 -5.12 3.0e-07 ***
## age25-29 0.488 0.262 1.86 0.063 .
## age30-39 1.043 0.245 4.26 2.0e-05 ***
## age40-49 1.326 0.283 4.68 2.9e-06 ***
## educationlow -0.342 0.171 -2.01 0.045 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##
## Names of linear predictors: logitlink(mu), logitlink(rho)
##
2022-01-19