闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAC51: Assignment 2

Deadline to hand in: Feb. 26 (Sunday) 10:00 pm, 2023

Total: 100 points

Please submit three ﬁles: R markdown ﬁle, knitted word ﬁle or pdf ﬁle from R markdown ﬁle, and hand-written scanned solution (if you have).

Note: Whenever you are using an R for generating random numbers, set seed to your student number. This can be done by simply adding the command set.seed(your student number) before generating the random number.

Q. 1 (25 pts) In this question we will do a simulation study to investigate some basic prop- erties of the conﬁdence intervals for odds ratios for contingency tables based on multinomial sampling.

(a) (10 pts) Use R to generate ten (n=10) 2 ×2 contingency tables with total count (i.

e., grand total), N = 100, and with known cell probabilities (πll , π l2 , π2l , π22 ) = (0.2, 0.3, 0.3, 0.2) from a multinomial distribution.

nij ~ multinomial(N, πll , π l2 , π2l , π22 )

Please don’t forget to use the command, set.seed(your student num- ber) right before the command generating random data.

i. Print out your results (the 10 tables you generated).

ii. What is the true odds ratio θ (i.e. population odds ratio) for these tables?

iii. For each of these generated tables, calculate the odds ratio and a 95 % large sample conﬁdence interval for the true odd ratio. Print all your table cell counts (i.e. for the 10 tables), estimated odds ratios (i.e. θˆ) and the conﬁdence intervals (lower and upper limits).

iv. How many of the 10 intervals contain the true odds ratio, θ?

(b) (10 pts) Repeat part (a) but this time with n = 1000000. Do not print the tables etc this time, but instead

i. Calculate the proportion of the intervals containing θ .

ii. Comment on your value.

Note: Any table with a zero cell count has odds ratio equal to 0 or 1. Replace any zero cell counts by 0.5. (this is often done when dealing with zero cell counts)

(c) (5 pts) Repeat part (b), but this time with N = 15. (i.e. still a million tables but each table with grand total 15), and comment about your result.

Q. 2 (10 pts) In this question, we will prove the formula for SE(log(θˆ)) =′ + + + using delta method. The delta method is a useful method to derive the asymp- totic variance of a test statistic. Suppose that θˆ = , where pij is deﬁned as below.

Column

1 2

Row 1

pll

p2l

pl2

p22

pl＋

p2＋

p＋l p＋2 N= n＋＋

We want to derive the variance of log( θˆ). The multivariate version of the delta method is

Var(θˆ) s 又f (pll , pl2 , p2l , p22 ) ov(pll , pl2 , p2l , p22 ) 又 f (pll , pl2 , p2l , p22 )T

Where 又 is the gradient vector. That is

又f (pll , pl2 , p2l , p22 ) = / , . . . , \

We assume the multinomial sampling since the total number of observations is ﬁxed.

Q. 3 (10 pts) The data contains results of a study comparing radiation therapy with surgery in treating cancer of the larynx.Do not use the R function ﬁsher.test. However, you may use the R function dhyper to evaluate the expression.

Cancer Controlled Cancer Not Controlled

Surgery Radiation therapy

nll = 21

n2l = 15

nl2 = 2

n22 = 3

(a) (5 pts) Test against the directional alternative that surgery is better than radi- ation therapy in controlling the cancer of the larynx using Fisher’s exact test. Find the p-value. What’s your conclusion of the test?

(b) (5 pts) Test against the two-sided alternative that Surgery and Radiation ther- apy diﬀer in controlling the cancer of the larynx using Fisher’s exact test. Find the p-value. What’s your conclusion of the test?

Q. 4 (15 pts) A 2010 survey asked 827 randomly sampled registered voters in California. “Do you support? Or do you oppose? Drilling for oil and natural gas of the Coast of California? Or do you not know enough to say?” Below is the distribution of responses, separated based on whether or not the respondent is a college graduate.

College Grad

Yes No

Support 154 132

Oppose 180 126

Do not know 104 131

Total 438 389

(a) (6 pts) Test whether two variables are independent or not using

i. Pearson’s X2 test

ii. The likelihood ratio G2 test of independence.

Please write down every term in the test statistic explicitly before eval- uating them. For each test, report the degrees of freedom, and the P- values. Interpret the results.

(b) (3 pts) Test whether the proportion of college graduates supporting of oﬀshore drilling equals to the proportion of non-college graduates supporting oﬀ- shore drilling using a two-sample test of proportions. Obtain the P-value.

(c) (3 pts) Do the conclusions of the tests in part (a) and (b) agree? Is it surprising or possible or is there anything wrong? Explain.

(d) (3 pts) Obtain the standardized residual for the chi-square test in (a) and de- scribe the association pattern between the two variables.

Q. 5 (15 pts) The table below shows results of an eight-center clinical trial to compare a drug to placebo for curing an infection. At each center, subjects were randomly assigned to groups.

Response

Center Treatment Success Failure

1	Drug Control	11 10	25 27
2	Drug Control	16 22	4 10
3	Drug Control	14 7	5 12
4	Drug Control	2 1	14 16
5	Drug Control	6 0	11 12
6	Drug Control	1 0	10 10
7	Drug Control	1 1	4 8
8	Drug Control	4 6	2 1

(a) Find the marginal table for the Treatment (drug, placebo) and Response (success, failure). Calculate and interpret the (sample) marginal odds ratios of the marginal table.

(b) Explain why it’s not a good idea to test the independence of Treatment and Response using the marginal table of Treatment and Response and ignore Center.

(c) Please test the conditional independence of Treatment and Response given Center using the Cochran-Mantel-Haenszel test. Please calculate the expected count and the variance for the cell (Drug, Success) for each of the 8 centers, write down every term in the numerator and the denominator of the CMH statistic explicitly before evaluating them, and ﬁnd the P-value.

(d) Calculate and interpret Mantel-Haenszel’s estimate of the common odds ratio between Treatment (drug v.s. placebo) and Response (success, failure). Please write down every term in the numerator and the denom- inator of the estimate explicitly before evaluating it.

(e) Verify your calculation in the previous two parts using the R command mantelhaen.test.

Q. 6 (25 pts) Refer to the “Alcohol Use and Infant Malformation”, and the data in Table

2.6 on Page 44 of [ICDA3] (our textbook).

Let X = mother’s alcohol consumption and Y = whether a baby has sex organ malformation. For the ﬁve levels of alcohol consumption (0, ¡ 1, 1-2, 3-5, ≥ 6 drinks per day), use the midpoints (0, 0.5, 1.5, 4.0, 7.0) levels as the mother’s true alcohol consumption X. You can get the data into R as follows:

mydata = data .frame(drinks = c(0,0 .5,1 .5,4,7),

absent = c(17066, 14464, 788, 126, 37), present = c(48, 38, 5, 1, 1) )

mydata$total = with(mydata, absent + present)

mydata$proportion = with(mydata, present/total)

(a) (2 pts) Let π(x) be the probability of a baby having sex organ malformation if the mother’s alcohol consumption during pregnancy was x. We want to ﬁt a linear probability model, π(x) = α + βx. Obtain the maximum like- lihood (ML) ﬁt of the linear probability model with the glm() function.

(b) (8 pts) From the R summary output obtained above,

i. Write down the ﬁtted regression equation for the model π(x) = α + βz .

ii. Interpret the intercept and slope in the context of the data.

iii. Estimate the probabilities of malformation for the lowest and highest alcohol levels: π(0) and π(7).

iv. Estimate and interpret the relative risk comparing the two levels in part (iii).

I suggest ﬁnding the estimated probabilities, “by hand” without using R funciton, predict.

(c) (2 pts) From the summary() output in part (a), get a 90% Wald conﬁdence interval for the coeﬃcient, β . Again, I suggest ﬁnding this by hands.

(d) (10 pts) Fit a logistic regression model

exp(α + βx)

π(x) =

i. Write down the ﬁtted regression equation (x) for the model .

ii. Interpret the intercept and slope in the context of the data.

iii. Estimate the probabilities of malformation for the lowest and highest alcohol levels: π(0) and π(7).

iv. Estimate the relative risk comparing the two levels in part (iii).

v. Calculate the odds ratio of malformations for alcohol levels 7 vs. 0.

(e) (3 pts) Graph a scatterplot of the sample proportions of malformation vs. the level of alcohol consumption (0, 0.5, 1.5, 4, 7). On the same graph, show

i. the ﬁtted line by the ML method from part (a)

ii. the ﬁtted line by the Least Square method.

iii. the ﬁtted logistic curve by the ML method from part (d). Why the slopes of the two straight lines diﬀer so much?

2023-02-21

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios