Answers for MA6521 2020 Exam
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Answers for MA6521 2020 Exam
Note there is often more than one possible answer to each question. The answers are illustrative examples. Note possible to use either SPSS or R as applicable.
1a)
The diastolic blood pressure before treatment is related to gender.
Firstly we test for normality:
H0: Before (split by gender) follows a normal distribution
H1: Before (split by gender) does not follow a normal distribution
No evidence to reject null hypothesis (p-values all greater than 0.05).
The we test for a difference between gender:
H0: There is no difference between gender
H1: There is a difference between gender
There is no evidence to reject the null hypothesis (p-value 0.286), therefore there is no evidence of a difference in gender.
Both drugs are effective in reducing diastolic blood pressure.
First split by drug and create differences for each drug (Diffstandard = Before – After, for standard drug only,
similarly for Diffnew)
Then check whether variables Diffstandard and Diffnew follow a normal distribution.
H0: Diffstandard/new follows a normal distribution
H1: Diffstandard/new does not follow a normal distribution
Reject the null hypothesis in both cases therefore is evidence to suggest diff does not follow a normal distribution, and we should use a non-parametric test.
H0: There is no difference between before and after (for each drug)
H1: There is a difference between before and after (for each drug)
Reject the null hypothesis for both drugs (p-value <0.001), therefore is evidence to suggest there is a difference between before and after.
The hypothesis test combined with the difference in means provides evidence that both drugs reduce diastolic blood pressure.
The new drug is more effective than the standard.
This question could be answered by considering whether the reduction is blood pressure is related to drug. First create new variable diff = before – after
Firstly we test for normality.
H0: Diff (split by drug) follows a normal distribution
H1: Diff (split by drug) does not follow a normal distribution
Reject the null hypothesis in both cases therefore is evidence to suggest diff does not follow a normal distribution, and we should use a non-parametric test.
H0: For the factor drug there is no difference between variable diff
H1: For the factor drug there is a difference between variable diff
Reject the null hypothesis (p-value <0.001) and conclude that the factor drug has an effect on the change in diastolic blood pressure.
The mean difference for standard drug is less than for the new drug, combined with hypothesis test, this suggests the new drug is more effective, reducing diastolic blood pressure by more on average.
Whether a larger number of participants is needed
We have proved all the student’s statements apart from there is a difference between gender. We would need a
larger sample size to do this. Basic power analysis can be done in R using the package pwr. We use a power of 0.8,
and look at the sample size needed for a difference of . We use the sample means for and the pooled
standard deviation for .
R code:
> by(examdata$before,examdata$gender,mean)
examdata$gender: 0
[1] 87.16
--------------------------------------------------------------------------------------
examdata$gender: 1
[1] 86.64
> by(examdata$before,examdata$gender,sd)
examdata$gender: 0
[1] 1.7
--------------------------------------------------------------------------------------
examdata$gender: 1
[1] 1.704895
> avsd=sqrt((1.7^2+1.704895^2)/2)
> d <- (87.16-86.64)/avsd
> pwr.t.test(d=d,sig.level=0.05,power=0.8,type = "two.sample")
Two-sample t test power calculation
n = 169.2247
d = 0.3054423
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
Need 170 females and 170 males to detect a difference that small between males and females. (Note this is based on a power of 0.8, changing the power will change the sample size).
b)
This report is about data from a study on blood pressure medication.
There is no evidence that the gender affects diastolic blood pressure before the drug is taken. A larger sample of 170 females and 170 males would be needed to be able to show the average difference of 0.52 between men and women is not just due to chance.
There is strong evidence in the data that taking either drug reduces diastolic blood pressure.
There is also strong evidence that those on new drug has a larger reduction in diastolic blood pressure than the standard drug. The mean reduction in diastolic blood pressure for the new drug is 10.3 compared to 8.1 for the standard drug.
In conclusion taking either drug reduced blood pressure with the new drug giving a larger reduction in blood pressure.
2)
Fit a Binary GLM as pest is a binary variable.
Species is a factor and length is a continuous covariate and site is a random effect.
First consider all variables except random effect:
R code to open data set:
> examdata <- read_excel("exam2020.xlsx",sheet=2)
> head(examdata)
R code to fit the full model (without random effect).
R code to fit the other combinations of models
> models<-glm(pest~ factor(species),data=examdata,family = binomial())
> modelc<-glm(pest~ circ,data=examdata,family = binomial())
> modelconst<-glm(pest~ 1,data=examdata,family = binomial())
Model selection GLMs AIC:
The best model, the one with the smallest AIC, includes circumference but not species. The model weight is 0.94, so there is one best model.
R code for model selection analysis of deviance.
> # Level 1 - compare to constant model
> Modellist <- list( )
> Modellist[[1]] <- models
> Modellist[[2]] <- modelc
> modelcompare <- modelconst
> modelname <- c("species", "circ")
> dev <- rep(0,(length(Modellist)))
> for (i in 1:(length(Modellist))){dev[i]=Modellist[[i]]$deviance}
> devcompare <- modelcompare$deviance
> DF <- rep(0,(length(Modellist)))
> for (i in 1:(length(Modellist))){DF[i]=Modellist[[i]]$df.residual}
> DFcompare <- modelcompare$df.residual
> diffdev=devcompare-dev
> diffDF=DFcompare-DF
> pval=pchisq(diffdev, df=diffDF, lower.tail=FALSE)
> devtable = data.frame(modelname,diffdev,diffDF,pval)
> colnames(devtable) <- c("Model", "Diff Deviance","Diff DF","p-value")
> print(devtable)
Model Diff Deviance Diff DF p-value
1 species 0.1827146 3 9.803303e-01
2 circ 34.3481438 1 4.608397e-09
At level 1 (comparing to constant model) only circumference makes a significant improvement to the deviance of the model.
> #Level 2 - compare to circ
> Modellist <- list( )
> Modellist[[1]] <- modelsc
> modelcompare <- modelc
> modelname <- c("circ+species")
> dev <- rep(0,(length(Modellist)))
> for (i in 1:(length(Modellist))){dev[i]=Modellist[[i]]$deviance}
> devcompare <- modelcompare$deviance
> DF <- rep(0,(length(Modellist)))
> for (i in 1:(length(Modellist))){DF[i]=Modellist[[i]]$df.residual}
> DFcompare <- modelcompare$df.residual
> diffdev=devcompare-dev
> diffDF=DFcompare-DF
> pval=pchisq(diffdev, df=diffDF, lower.tail=FALSE)
> devtable = data.frame(modelname,diffdev,diffDF,pval)
> colnames(devtable) <- c("Model", "Diff Deviance","Diff DF","p-value")
> print(devtable)
Model Diff Deviance Diff DF p-value
1 circ+species 0.3253402 3 0.9551943
At level 2 (comparing to circumference) species is not significant. Therefore the best model includes only circumference.
Both AIC and analysis of deviance select the best model includes only circumference
R code for fitting circ + species random effect
> modelcr <- glmer(pest~ circ+(1|site),data=examdata,family = binomial())
LRT testing whether need random effect
> LRT <- 2*(logLik(modelcr)[1]-logLik(modelc)[1])
> pvalue <- 1-pchisq(LRT, df=1)
> LRTresults <- data.frame(LRT,pvalue)
> print(LRTresults)
LRT pvalue
1 91.39664 0
P-value < 0.0001 so include random effect
2c
The best model includes the covariate circumference and the random site effect .
The parameter estimates and standard errors for this model are as follows:
Parameter Estimate SE
Intercept -3.85 0.85
Circ 0.034 0.0052
Site RE (variance) 1.46
The probability the pest is present increases as the circumference of the tree increases. The random site effect has a variance of 1.46.
2022-05-09