Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

HW4

Due December 14

1:

The dataset pima comes from a study in the National Institute of Diabetes and Digestive and Kidney Diseases. The purpose of this study was to investigate factors related to diabetes. You may obtain data from the R library faraway.

1. Conduct simple graphical and numerical summaries of the data. Try to find any irregularities in the data? If you do, take appropriate steps to correct the problems.

2. Take the diabetes test result as the response with other variables as predictors to fit a model. Whether the model fits the data?

3. Show the difference in the odds of testing positive for diabetes for a woman with a BMI at the first quartile compared with a woman at the third quartile? Here you may assume that all other factors are held constant. Give a confidence interval for this difference.

4. Does the diastolic blood pressure significant in the model? Does positive result suggests higher diastolic blood pressures? Please try to explain the difference of the two questions and discuss why the answers are only apparently contradictory.

2:

An experiment was conducted to investigate if incubation temperature can affect the sex of turtles. Please find the dataset turtles from faraway library. The data contains three replicates for each temperature and the number of male and female turtles born.

1. Plot the proportion of females against the temperature. Comment on the nature of the relationship.

2. Take the number of male and female turtles born as the response with temperature as predictors to fit a model. Whether the model fits the data?

3. Please check if the data is sparse and contains any outliers.

4. Check whether the predictors are correctly expressed. Compute the empirical logits and plot these against temperature.

5. Add a quadratic term in the temperature to the model. Is this model acceptable?

3:

The dataset esoph comes from a case-control study of esophageal cancer in Ile-et-Vilaine, France.

The data is distributed with R and can be obtained along with “data(esoph)”.

1. Fit a binomial GLM with interaction effects between all predictors. You may try to simplify the model as far as it is reasonable.

2. You may use the unclass() in R to convert some categorical predictors to a numerical representation and try to fit a binomial GLM and show how the model can be simplified.

3. Does your final model fit the data? Is the test you make accurate for this data?

4. What is the predicted effect of moving to a category one higher in alcohol consumption?

Compute a 95% confidence interval for this predicted effect.

5. Since this is a case-control study, what can you conclude about the predicted probability that a 25 year old who does not smoke or drink will get esophageal cancer?

4:

The salmonella data was collected in a salmonella reverse mutagenicity assay. The predictor is the dose level of quinoline and the response is the numbers of revertant colonies of TA98 salmonella observed on each of three replicate plates.

1. Plot the data and comment on the relationship between dose and colonies.

2. Compute the mean and variances within each set of observations with the same dose. Plot the variance against the mean and comment on what this says about overdispersion.

3. Fit a model with dose treated as a six-level factor. Does your model fit the data? Is it possible to find a transformation of the predictor that results in a Poisson model that does fit the data?

4. Fit a Poisson model with an overdispersion parameter. Is the fit adequate?

5. Plot the residuals against the fitted values for the previous model. Interpret the plot.

6. Give the predicted mean response for a dose of 500. Compute a 95% confidence interval.