闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAC51: Categorical Data Analysis

Assignment 1

Deadline to hand in: Jan. 29 (Sunday) 10:00 am, 2023

Total: 100 points

Note: Submit the R markdown ﬁle along with your compiled pdf ﬁle and/or hand-written part. You should provide comments quoting necessary values from your outputs so that it will be easy for TAs to grade your work.

Whenever you are using an R for generating random numbers, set seed to your student number. This can be done by simply adding the command set.seed(your student number) before generating the random number.

Q. 1 (10 pts) (2 pts each) Let (Y1 , Y2 , . . . , Yk ) ~ Multinomial(n, π1 , π2 , . . . , πk ), then

(a) Calculate the moment generating function (this will be a multivariate MGF).

(b) Show that E(Yj ) = nπj

(d) Show that Cov(Yi , Yj ) = -nπi πj

(e) Show that for C = 2 (two categories), Cor(Y1 , Y2 ) = -1. Explain why.

Note: you can use the Multivariate MGF. Recall if Y = (Y1 , Y2 , . . . , Yk ) be a multivariate random variable, then the MGF is deﬁned as:

MY (t1 , t2 , . . . , tk ) = E(exp(t1 Y1 + . . . + tk Yk ))

Now use the partial derivative on tj or ti , tj to achieve the results.

Q. 2 (20 pts) At any given time, soft-drink dispensers may harbor bacteria such as Chry- seobacterium meningosepticum that can cause illness. To estimate the pro- portion of contaminated soft-drink dispensers in a community in Toronto, researchers randomly sampled 30 dispensers and found 5 to be contaminated with Chryseobacterium meningosepticum. Let π be the proportion of con- taminated dispensers in the population.

(a) (6 pts) We want to test whether 10% of the dispensers in the population were contaminated. Compute the test statistics for a score test, a Wald test, and a likelihood ratio test for the hypotheses

H0 : π = 0.1 v.s. Ha : π 0.1

and report the 3 P-values.

(b) (2 pts) Find the 90% Wald conﬁdence interval for π .

(d) (3 pts) Find the 90% Agresti-Coull conﬁdence interval for π . (e) (3 pts) Verify your score test and score conﬁdence interval using R. (f) (3 pts) Verify Wald, Agresti-Coull conﬁdence interval using R.

Q. 3 (10 pts) (2 pts each) Continue the previous problem with y = 5 and n = 30. In this problem, we will ﬁnd the likelihood-ratio test based conﬁdence interval for π:

(a) Find a0 , the maximized likelihood under H0 : π = π0 (b) Find a1 , the maximized likelihood over all possible π values.

(d) How big the likelihood-ratio test statistic must be at least to be signiﬁ- cant at 0.1 level? Use R command, qchisq().

(e) Compute the 90% likelihood-ratio test-based conﬁdence interval.

Q. 4 (20 pts) (10 pts each)Observed (or true) coverage and the targeted coverage proba- bilities of conﬁdence intervals are not necessary equal. In this question, we will calculate the observed (or true) coverage probability of Wald conﬁdence intervals using two methods: Mote Carlo simulation and direct calculation.

(a) (Monte Carlo simulation) Generate N = 100, 000 observations on Y where Y ~ Bin(n, π), where n = 25 and π = 0.06. From each obser- vation generated, calculate a Wald 95% conﬁdence interval for the pop- ulation proportion (π). (Note: This means you are calculating 100,000 conﬁdence intervals). Calculate the proportion of these Wald intervals that contain 0.06 (the value of π). Comment on your results.

(b) (Direct calculation) In order to calculate the coverage probability for a known value of π, calculate a conﬁdence interval for every possible value of y (y = 0, 1, . . . , n) and check whether true value of the parameter is in the conﬁdence interval calculated. Below are the steps:

i. Find all possible intervals that one could have with y = 0, . . . , n

ii. Form I(y) = 1 if the interval for a y contains π and 0 otherwise

iii. Calculate the true conﬁdence level as

y 0 I(y) ╱ \y(n) (π)y (1 - π)n −y

Q. 5 (20 pts) In this question, we will also calculate and plot the true coverage probabilities of Wald conﬁdence intervals for proportions (i.e. Binomial parameter) based on a sample of given size (n), but this time we calculate the coverage prob- abilities for many values of π making a plot of coverage probability versus π .

(a) (5 pts) For a Binomial sample of size n = 25, use the method in part (b) of the previous question (i.e. direct calculation) to calculate the coverage probability of a 95 % conﬁdence interval for π = 0.01, 0.02, . . . , 0.99 and plot them against π . Draw a horizontal line through the target probability 0.95. Comment on what you learned from your plot.

(b) (5 pts) Repeat part (a) above with n = 500 and plot both the curves on the same plot. Compare and comment on your ﬁndings.

(c) (10 pts) Repeat part (a) for Wald, Wilson (Score), Agresti-Coull and Clopper-Preason conﬁdence intervals and plot the coverage probabilities versus π for all four conﬁdence intervals on one graph (i.e all four curves on the same system of axes). Do not use built-in R functions from “binom” package for these conﬁdence intervals.

Use four diﬀerent colours for easy comparison. Compare and com- ment on your results. (Note that in this part, we are using the same values as in part (a) above, i.e n = 25, 95% conﬁdence interval and π = 0.01, 0.02, . . . , 0.99)

Q. 6 (20 pts) (5 pts each) Feller (1957) includes a number of examples of ob- servations ﬁtting the Poisson distribution, including data on the number of ﬂying-bomb hits in the south of London during World War II. The city was divided into 576 small areas of one-quarter square kilometers each, and the number of areas hit exactly k times was counted. The observed frequencies are given in the table below.

Frequency 229 211 93 35 7 1

Assume that the number of hits in a small area (of one-quarter square kilo- meters) has a Poisson distribution with parameter λ . Note that the Poisson p.m.f, P (Y = y) = e −y(入)!λ入

(a) Find the likelihood function for this observed data (you may ignore the

terms that are not important for maximum likelihood estimation). (b) Write an R code to plot the likelihood function (for λ e (0.6, 1.2))

(d) Use the likelihood ratio test to test the null hypothesis H0 : λ = 1 against the alternative H1 : λ 1