Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 2 - MAT 4378

2022

Instructions

1) Please submit your solutions to this assignment in one PDF file in Brightspace. Only one file will be accepted.

2) You can submit a PDF file more than once.  However, only the last submission will be saved.  If you want to modify your submitted assignment, that is fine as long as it is before the deadline.

3) Late submissions of the assignment are not going to be marked.

4) In the second part of the assignment, you must use R for all of your computations.  Please use R markdown to write the solutions for this part.

5) You can submit hand written solutions for part one of the assignment, but please combine images of your hand-written solutions with the PDF produced with R markdown as one PDF. (See https: //imagetopdf.com/ as a possible solution to combine images as one PDF).

6) Deadline: Before 11:59 pm on Friday, October 21

7) You can work in groups of up to four.   Please only one member of the group should submit the assignment in Brightspace.

Part one

You can provide hand-written solutions for this part, but it is not necessary. You are welcome to try to write your solutions with latex using R markdown. You can only use R to compute quantiles and probabilities for the normal and chi-square distributions. It is not necessary to provide the R output for Part one.

1. Olestra is a fat substitute for use in snack foods.   Because there have been anecdotal reports of gastrointestinal problems associated with olestra consumption, a randomized, double-blind, placebo- controlled experiement was carried out to compare olestra potato chips to regular potato chips with respect to GI symptoms (“Gastrointestinal Symptoms Following Consumption of Olestra or Regular Triglyceride Potato Chips”, J. of the Amer. Med. Assoc., 1998:  150-152). Among 529 individuals in the TG control group, 17.6% experienced an adverse GI event, whereas among the 563 individuals in the olestra treatment group, 15.8% experienced such an event.

(a) Using the Score Test, test for the equality of the incidence rates of GI program between the two groups (b) Give the 95% Agresti-Caffo confidence interval for the risk difference. Interpret the results in a sentence.

(c) Give a 95% confidence interval for the relative risk. Interpret the results in a sentence.

2. Consider two independent binomial experiments. We will model the study as two independent binomial random variables Y1 , Y2 , where Yj  ∼ B (nj , πj ), for j = 1, 2. Consider the estimated odds ratio

R = 1 /(1 1 )

where j  = yj /nj , for j = 1, 2.  Consider the non-linear statistic θˆ = log(R). Use the delta method to show that the linearized variance estimator for θˆ is

[θˆ] =  1  +      1     +  1  +      1    

3. Consider the following table. Based on data reported in Table IV, R. Doll and A.B. hill, Br Med. J., 739-748, September 30, 1950.

Lung Cancer

Have smoked    cases    controls

yes no

688

21

650

59

total

This is data from one of the first studies linking lung cancer and smoking. In 20 hospitals in London, UK, patients admitted with lung cancer in the previous year were queried about their smoking behaviour.  For each patient admitted, researchers studied the smoking behavior of a non-cancer control patient at the same hospital of the same sex and within the same 5-year grouping on age. A smoker was defined as a person who had smoked at least one cigarette a day for at least a year. (You can interpret these data as 2 independent random samples each of size n = 709.)

(a) Identify the response (also said outcome) variable and the explanatory variable.

(b) Identify the type of study this was.

(c) Can you use these data to compare smokers with nonsmokers in terms of the proportion who suffered lung cancer? Why or why not?

(d) Summarize the association with a descriptive parameter. Explain how to interpret it. Also give a 95% confidence interval for this parameter.

4. Consider the Poisson distribution with parameter µ, where µ > 0. Its probability mass function is pµ (y) = e µ µy /y!

for y = 0, 1, 2, . . ..

(a) Show that the Poisson distribution belongs to the exponential family of distribution with natural parameter η = log(µ), where log(x) = ln(x), and show that its cumulant function is c(η) = eη .

(b) Use the cumulant function to show that the mean and the variance for this distribution are E [Y] = µ , and V [Y] = µ, respectively.

(c)  Suppose that we have a random sample of size n = 100 from a Poisson distribution with mean µ . The

sample mean of the 100 observations is y¯ = 22.5. Use the score test to test

H0  : µ = 20   against   Ha  : µ  20.

Give the observed value of the score test statistic, the p-value, and the conclusion at α = 5%.

Part Two

Please R for all computations, and for building graphs in this part of the assignment.  Note that we also want answers to some of this questions, that do not involve R. R will only be used for the computation, and to produce graphs.  For some of these questions, the R output will not be sufficient.  You will need to interpret, to describe, and give conclusions.

5. Consider data from Aseffa et al. (1998) that examines the prevalence of HIV among women visiting health care clinics in northwest Ethiopia. Along with testing individuals for HIV, additional information was collected on each women such as condom use. The data is found in the following contingency table:

Condom

HIV

Positive    Negative

Total

Never

Ever

251

48

34

5

285

53

This is an example of multinomial sampling, where each of the n statistical units is described in terms of two binary variables. In this context, we can always condition on the observed row totals. By doing so, we obtain (conditionally) independent binomial experiments in the rows, from which we can estimate

π 1  = P (HIV positive|never used condom),     and    π2  = P (HIV positive|ever used condom).

(a) Perform the score test at α = 5% to test the equality of π 1  and π2 .

(b) Give the 95% Agresti-Caffo confidence interval for π 1 − π2 .

(c) Estimate the risk ratio RR = π 1 /π2 , and interpret the value in the context of this problem.

(d) Give a 95% confidence interval for the risk ratio RR = π 1 /π2 .

(e) Estimate the odds ratio and interpret the estimate.

(f) Give a 95% confidence interval for the odds ratio.