闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MAST90084: Statistical Modelling Assignment 1

1. Let X and Y be two categorical random variables, X with I diﬀerent categories identiﬁed with the set {1, . . . , I} and Y with J diﬀerent categories identiﬁed with the set {1, . . . , J}. Suppose observations of the variable pair (X, Y) are tabulated in a I × J contingency table. Using standard notations, for a given (i, j) e {1, . . . , I} × {1, . . . , J}, nij is the entry in the (i, j)-th cell that denotes the count of observations with X equal to its i-th category and Y equal to its j-th category. A Poisson sampling model for the contingency table assumes that the nij’s are independently distributed with

nij ~ Poi(µij),

where µij denotes the Poisson mean for the cell count nij .

(a) Derive the conditional joint distribution of {nij}(i,j)e(1,...,I}x(1,...J} given n, where n := 1<i<I nij .

1<j<J Identify the name of this distribution, and explicitly state what its parameter values are in terms of

{µij}(i,j)e(1,...,I}x(1,...J} and n. [5]

(b) Let I = J = 2. The quantity , also known as the odds ratio, measures the association between

X and Y . What should be the value of the odds ratio if X and Y are independent and why? [3]

2. Data in the following 2 × 2 × 3 contingency table were used to study the eﬀect of passive smoking on lung cancer. The table summarizes the results of case-control studies from 3 countries for nonsmoking women married to smokers. (Source: Blot and Fraumeni, J. Nat. Cancer Inst., 77:993-1000 (1986) and Agresti

(1996).)

Country

Spouse Smoked

Cases

Controls

Japan

Yes

188

Yes

USA

Yes

137

249

363

(a) A log-linear model mod1 can be ﬁtted to the data, with the results being given in the following R output. Give the mathematical formula of form ln(µ) = . . . for the mean model of mod1, where µ is the mean of the response. Any dummy variables in your formula should be explicitly deﬁned. [5]

> pasSmoking .dat=data .frame(freq=c(21,73,5,19,71,137,82,188,16,38,249,363))

> pasSmoking .dat$Cnt=factor(rep(c("Japan","UK", "USA"), times=2, each=2))

> pasSmoking .dat$Smo=factor(rep(c("No","Yes"), times=6))

> pasSmoking .dat$Can=factor(rep(c("Case","Control"), each=6))

> pasSmoking .dat

freq Cnt Smo Can

1 21 Japan No Case

2 73 Japan Yes Case

3 5 UK No Case

4 19 UK Yes Case

5 71 USA No Case

6 137 USA Yes Case

7 82 Japan No Control

8 188 Japan Yes Control

9 16 UK No Control

10 38 UK Yes Control

11 249 USA No Control

12 363 USA Yes Control

> mod1=glm(freq~Cnt+Smo+Can+Cnt:Smo+Cnt:Can+Smo:Can, family=poisson, data=pasSmoking .dat) > anova(mod1, test="Chisq")

Analysis of Deviance Table; Model: poisson; Link: log; Response: freq

Terms added sequentially (first to last)

Df Deviance Resid . Df Resid . Dev P(>|Chi|)

NULL

Cnt

Smo

Can Cnt:Smo Cnt:Can Smo:Can

11 1168 .85

2 726 .43 9 442 .42 < 2 .2e-16

1 112 .52 8 329 .90 < 2 .2e-16

1 307 .56 7 22 .34 < 2 .2e-16

2 15 .50 5 6 .84 0 .0004316

2 1 .05 3 5 .80 0 .5919109

1 5 .56 2 0 .24 0 .0184215

> 1 -pchisq(0 .24,2)

[1] 0 .8869204

> 1 -pchisq(5 .80,3)

[1] 0 .1217566

(b) Expanding the notation from Question 1, for the current contingency table we can also use nijk to denote the count in each cell, where i e {1, 2}, j e {1, 2}, k e {1, 2, 3} are indices corresponding to Can (variable X), Smo (variable Y) and Cnt (variable Z) respectively. Moreover, if nijk are independently distributed with

nijk ~ Poi(µijk),

one can, for any k e {1, 2, 3}, deﬁne the odd ratios θXY (k) = for the partial table with Z = k .

The table is said to have homogeneous XY association when θXY (1) = θXY (2) = θXY (3) . Explain why

the model in part (a) has XY homogenous association. [5]

(c) Based on the displayed R output in (a), test the signiﬁcance of the interaction eﬀect Smo:Can at signiﬁcance level 0.05, eliminating the eﬀects of all other terms in mod1. Provide your conclusion with clear explanation. [4]

(d) Based on the displayed R output in (a), test the adequacy/goodness-of-ﬁt of model

Cnt+Smo+Can+Cnt:Smo+Cnt:Can

at signiﬁcance level 0.05. Provide your conclusion with clear explanation.

(e) Are your conclusions in (c) and (d) contradictory? You must give an explanation to get any score. [5]

3. A variable Y taking values in {0, 1, 2, . . . } has a Negative Binomial (NB) distribution if its probability

mass function has the form

p(Y = y; µ, κ) =

for y = 0, 1, . . . , where µ is the mean of Y .

(a) When κ is considered as ﬁxed (or known), the NB distribution belongs to the exponential dispersion

model (EDM) discussed in class. Write out its form as an EDM explicitly. In particular, you have to identify the natural parameter θ and the dispersion parameter φ in terms of µ and κ whenever appropriate, and identify b(.) (as a function of θ). You can simply take the weight ω to be 1. [5]

(b) Let σ 2 be the variance of Y . From your answer above, derive the formula for σ 2 as a function of µ . Why do we say that the NB distribution can be used as a likelihood model to handle “overdispersion” compared to the Poisson distribution? [4]

2023-03-24

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言