闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 330

Semester 2, 2019

Statistical Modelling

STATS 330 Mid-Semester Test

Instructions:

1. Time allowed: 50 minutes.

2. Use Use the answe book provided.

3. Give us your details on the front page of the answer book.

4. In general, show your working if required.

5. There are useful equations on the ﬁnal page of this test.

The following data show the distribution of 1607 currently married and fecund women interviewed in the Fiji Fertility Survey, according to age, education, desire for more children and current use of contraception. We are interested in which of these variables are related to contraceptive use, and how.

This data set came from the following article:

Little, R. J. A. (1978). Generalized Linear Models for Cross-Classiﬁed Data from the WFS. World Fertility Survey Technical Bulletins, Number 5.

The data set BC.df has the following variables:

age	The woman’s age: "<25" "25-29", "30-39 and "40-49"
education	education level: "low" or "high"
wantsMore	want to have more children: "yes" or "no"
notUsing	the count of the number of women not using contraception
Using	the count of the number of women using contraception

In the analysis below, all of the explanatory variables are treated as a factors.

head(BC.df, 4)

## age education wantsMore notUsing using

## 1 <25 low yes 53 6

## 2 <25 low no 10 4

## 3 <25 high yes 212 52

## 4 <25 high no 50 10

Question 1

Consider the following model ﬁtted to these data in R:

bin1.fit <- glm(cbind(using, notUsing) ~ wantsMore + age + education,

family = "binomial" ,data = BC.df)

summary(bin1.fit)

## Call:

## glm(formula = cbind(using, notUsing) ~ wantsMore + age + education, ## family = "binomial", data = BC.df)

## Deviance Residuals:

## Min 1Q Median 3Q Max

## -2.515 -0.938 0.241 0.982 1.733

## Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -0.808 0.159 -5.08 3.7e-07 ***

## wantsMoreyes -0.833 0.117 -7.09 1.3e-12 ***

## age25-29 0.389 0.176 2.21 0.0268 *

## age30-39 0.909 0.165 5.52 3.4e-08 ***

## age40-49 1.189 0.214 5.55 2.9e-08 ***

## educationlow -0.325 0.124 -2.62 0.0088 **

## ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## ## (Dispersion parameter for binomial family taken to be 1)

## Null deviance: 165.772 on 15 degrees of freedom

## Residual deviance: 29.917 on 10 degrees of freedom

## AIC: 113.4

## Number of Fisher Scoring iterations: 4

1-pchisq(bin1.fit$deviance ,bin1.fit$df.residual)

## [1] 0.00088376

Write equations to describe the ﬁtted model bin1.fit.

[5 marks]

logit(pi ) = β0 +β1wantsMorei +β2 age[25-29]i +β3 age[30-39]i +β4 age[40-49]i +β5 education[low]i

Yi ～ Binomial(ni , pi ),

where wantmorei = 1 if the ith woman replied ”yes” to want more children, and 0 otherwise, age[25 - 29]i = 1 if the ith woman replied is aged between 25 and 29 to, and 0 otherwise, age[30 - 39]i = 1 if the ith woman replied is aged between 25 and 29 to, and 0 otherwise, age[40 - 49]i = 1 if the ith woman replied is aged between 25 and 29 to, and 0 otherwise, education[low]i = 1 if the ith woman replied has a low education, 0, otherwise.

Question 2

Also consider the following model:

bin2.fit=glm(cbind(using, notUsing) ~ wantsMore+age*education,

family = "binomial" , data = BC.df)

anova(bin1.fit , bin2.fit , test = "Chisq")

## Analysis of Deviance Table

## Model 1: cbind(using, notUsing) ~ wantsMore + age + education

## Model 2: cbind(using, notUsing) ~ wantsMore + age * education

## Resid. Df Resid. Dev Df Deviance Pr(>Chi)

## 1 10 29.9

## 2 7 23.1 3 6.77 0.08 .

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(a) What is the null hypothesis being tested by the anova() function?

[5 marks]

The null hypothesis is that there is no interaction between age and education.

(b) What do you conclude from this hypothesis test?

[5 marks]

We have, at best, weak evidence to suggest that an interaction exists between age and education. Given that this would require an additional three terms, it’s possibly not worth th eﬀort.

Question 3

Consider the following R output:

df.residual(bin1.fit)

## [1] 10

1 - pchisq(deviance(bin1.fit), df.residual(bin1.fit))

## [1] 0.00088376

deviance(bin2.fit)

## [1] 23.151

df.residual(bin2.fit)

## [1] 7

1 - pchisq(deviance(bin2.fit), df.residual(bin2.fit))

## [1] 0.0016043

(a) Based on the above output, do you think the model bin1.fit is appropriate? Explain your answer.

[5 marks]

No, we have strong evidence to suggest the model is not appropriate. If the model is correct, we might expect the deviance to come from a chi-squared distribution with 10 degrees of freedom, but this is not plausible.

We can’t really say from the information provided. We don’t know if the deviance is well approx- imated by a chi-square distribution unless we know whether or not the number of people in each group is suﬃciently large and the proportion of people is suﬃciently far away from 0 and 1.

(b) Based on the above output, do you think the model bin2.fit is appropriate? Explain your answer.

[5 marks]

No, we have strong evidence to suggest the model is not appropriate. If the model is correct, we might expect the deviance to come from a chi-squared distribution with 7 degrees of freedom, but this is not plausible.

Question 4

Suppose the researchers have theoretical reasons to believe that interaction eﬀects are not so they consider the following models.

######## Quasi-Binomial model #####

quasibin1.fit <- glm(cbind(using, notUsing)~ wantsMore + age + education, family = "quasibinomial" , data=BC.df)

summary(quasibin1.fit)

## Call:

## glm(formula = cbind(using, notUsing) ~ wantsMore + age + education, ## family = "quasibinomial", data = BC.df)

## Deviance Residuals:

## Min 1Q Median 3Q Max

## -2.515 -0.938 0.241 0.982 1.733

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -0.808 0.267 -3.02 0.0128 *

## wantsMoreyes -0.833 0.198 -4.22 0.0018 **

## age25-29 0.389 0.296 1.32 0.2174

## age30-39 0.909 0.277 3.28 0.0083 **

## age40-49 1.189 0.361 3.30 0.0080 **

## educationlow -0.325 0.209 -1.56 0.1503

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##

## (Dispersion parameter for quasibinomial family taken to be 2.8288) ##

## Null deviance: 165.772 on 15 degrees of freedom

## Residual deviance: 29.917 on 10 degrees of freedom

## AIC: NA

## Number of Fisher Scoring iterations: 4

confint(quasibin1.fit)

## Waiting for profiling to be done...

present,

##	2.5 %	97.5 %
## (Intercept)	-1.34697	-0.295060
## wantsMoreyes	-1.22190	-0.446586
## age25-29	-0.18652	0.976756
## age30-39	0.37550	1.464040
## age40-49	0.48727	1.904425
## educationlow	-0.73869	0.080112