STAT 4052 Homework 5
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 4052
Homework 5
Q1 - Consider the birthweight .txt file available on Canvas. The dataset contains information
regarding 189 births at a US hospital. The main interest is low birth weights. The following variables are available in this dataframe
low birth weight less than 2.5 kg (0 or 1 reported),
age age of the mother in years,
lwt weight of mother in pounds,
race black (1) / white (2) / other (3),
smoke smoking status during pregnancy (TRUE or FALSE reported),
ptd number of previous premature labours,
ht history of hypertension (TRUE or FALSE reported),
ui has uterine irritability (TRUE or FALSE reported),
ftv number of physician visits in the first trimester (0, 1 , 2, 3+ reported)
(a) Fit a binomial GLM with canonical link function using the binary variable low as the response and the other variables as covariates. Report the output of the model you have obtained.
(b) Implement a deviance test for the model fitted in (a) and report your results.
(c) Looking at the deviance test performed in (b), would you conclude that the model provides a good fit for the data? Justify your answer.
(d) How would you interpret the coefficient associated with the variable smoke? (e) How would you interpret the coefficient associated with the variable lwt?
(f) A woman is about to deliver a baby. She is 40 years old, she is white and her weight
is about 170lbs. She did not smoke during pregnancy, she has never had premature labours nor hypertension. She does suffer from uterine irritability and she visited her physician three times during her first trimester of pregnancy. What is the probability that her baby will have low weight?
(g) Compute a 95% prediction interval for the prediction obtained in (f).
Q2 - Consider the moths dataset from the R package DAAG. The data frame has 41 rows and
4 columns. These data are from a study of the effect of habitat on the densities of two species of moth (A and P). Transects were set across the search area. Within transects, sections were identified according to habitat type. The following variables are available in
this dataframe
meters length of transect,
A number of type A moths found,
P number of type P moths found,
habitat a factor with levels Bank, Disturbed, Lowerside, NEsoak, NWsoak, SEsoak, SWsoak, Upperside.
(a) Fit a Poisson GLM with canonical link function using the variable A as the response and meters and habitat as predictors. Report the output of the model you have obtained.
(b) Implement a Pearson χ2 test for the model fitted in (a) and report your results.
(c) Looking at the Pearson test performed in (b), would you conclude that the model provides a good fit for the data?
(d) How would you interpret the coefficient associated with the variable meters? (e) From the output obtained in (a), would you expect the Poisson assumption to be
valid for these data? Justify your answer.
2022-10-18