闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Department of Statistics and Data Science

ST5213 Categorical Data Analysis II

Revision

1. Duchenne Muscular Dystrophy (DMD) is a genetically transmitted disease passed from a mother to her children. Boys with the disease usually die at a young age; but aﬀected girls usually do not suﬀer symptoms, may unknowingly carry the disease and may pass it to their oﬀspring. It is believed that about 1 in 3,300 women are DMD carriers. A woman might suspect she is a carrier when a related male child develops the disease. Doctors must rely on some kind of test to detect the presence of the disease. This data frame contains data on two enzymes in the blood, creatine kinase (CK) and hemopexin (H) for 38 known DMD carriers and 82 women who are not carriers. It is desired to use these data to obtain an equation for indicating whether a woman is a likely carrier. Speciﬁcally the following data was collected:

Variable

Description

Group

Indicator whether the woman has DMD (Case) or not (Control) Creatine kinase reading

Hemopexin reading

R was used to explore these data and to ﬁt various models to them. Using the R output given on pages 5–9 in the appendix, answer the following questions:

(a) Can you think of a reason why the analysis looks at CK and log(CK) as possible

explanatory variables? Comment brieﬂy.

(b) Do you think it would be necessary to also consider log(H) as a possible

regressor? Comment brieﬂy.

(d) Except for the estimate of the intercept, interpret each estimate in your pre- ferred model in a sentence or two.

2. Schoener (1968) collected information on the distribution of two Anolis lizard species (A . opalinus and A . grahamii) to see if their ecological niches were dif- ferent in terms of where and when they perched to prey on insects. Perches were classiﬁed by twig diameter, their height in the bush, whether the perch was in sun or shade when the lizard was counted, and the time of day at which they were foraging. The observed data is given in the following contingency table:

Lizard species

Height Diameter Sun Time A . grahamii A . opalinus

Low

Thin

Sun

Shade

Early

MidDay

Late

Early

MidDay

Late

Thick

Sun

Shade

Early

MidDay

Late

Early

MidDay

Late

High

Thin

Sun

Shade

Early

MidDay

Late

Early

MidDay

Late

Thick

Sun

Shade

Early

MidDay

Late

Early

MidDay

Late

Assume we want to analyze this contingency table using loglinear models. Further assume we have a dataframe, say lizards, with variables H (height of perch), D (diameter of perch), S (whether perch is in the sun), T (time of day), Sp (observed species) and Count (number of observations in each category).

(a) State which variables are explanatory variables and which are response vari-

ables for the purpose of this analysis.

(b) State the minimal model. Is the minimal model a graphical model?

glm(Count ~ Sp * H * D * S * T, data=lizards, family=poisson)

and then successively remove certain interaction terms. List all interaction terms that we (potentially) have to test.

3. This question concerns the eﬀect on political party identiﬁcation of sex and race by U.S. voters. The data is given in the following table.

Sex	Race	Party Identiﬁcation
Sex	Race	Democrat	Republican	Independent
male	white black	132 42	176 6	127 12
female	white black	172 56	129 4	130 15

On pages 10– 11 of the appendix, you ﬁnd the output of ﬁve models, named fm0, fm1, fm2, fm3 and fm4, that were ﬁtted to these data using R. Use this output to answer the following questions.

(a) Draw a model lattice that shows how these ﬁve models are nested within each other.

(b) Which of these models is your preferred model? Justify your answer using

likelihood-ratio tests.

(c) For your preferred model, write down the ﬁtted equation for the log odds of a person preferring ‘Democrat’ instead of ‘Independent’ . Take care to deﬁne all the symbols that you use.

(d) For your preferred model, ﬁnd the ﬁtted equation for the log odds of a per- son preferring ‘Democrat’ instead of ‘Republican’ . Take care to deﬁne all the symbols that you use. Except for the estimate of the intercept, interpret each estimate in this ﬁtted equation in a sentence or two.

4. Consider a random variable X with a binomial distribution with parameters n and π, i.e. X ~ Bin(n, π). Let x denote an observed value of X . The maximum likelihood

estimator of π is = . In this context, other parameters that are often of interest

are the odds θ = and the log-odds ψ = log θ = log

(a) Write down expressions for E(X) and Var(X).

(b) Verify, using the delta method, that the approximate mean and variance of

1 _

E(ψˆ) ≈ ψ and Var(ψˆ) ≈ 1

respectively.

(d) Use the delta method to ﬁnd expressions in terms of x for the approximate

mean and variance of θˆ =

(a) Why are there four intercepts? Explain how they determine the estimated response distribution for males in urban areas wearing seat belts.

(b) Construct a conﬁdence interval for the eﬀect of gender, given seat-belt use and location. Interpret.

(c) Find the estimated cumulative odds ratio between the response and seat-belt use for those in rural locations and for those in urban locations, given gender. Based on this, explain how the eﬀect of seat-belt use varies by region, and explain how to interpret the interaction estimate, _0.1244.

2022-11-23

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言