Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 426 Assignment 2

Due Wednesday, February 1, 11:59 pm.

Submit through Moodle.

Name:  SOLUTIONS

Netid:

Submit your computational work both as an R markdown (*.Rmd) document and as a pdf, along with any les needed to run the code.  Embed your answers to each problem in the document below after the question statement. If you have hand-written work, please scan or take pictures of it and include in a pdf le, ideally combined with your pdf output le from R Markdown. Be sure to show your work.

Problem 1 (10 pts)

Genotypes AA, Aa, and aa occur in the population with probabilities (π1 , π2 , π3 ) respectively. A random sample of n from the population gives genotype counts (n1 , n2 , n3 ), where n = n1 + n2 + n3 .

(a) (2 pts) If n = 3, show all the possible observation vectors (n1 , n2 , n3 ).

Answer:

(3,0,0), (2,1,0), (2,0,1), (1,1,1),

(0,3,0), (1,2,0), (0,2,1),

(0,0,3), (1,0,2), (0,1,2)

(b) (2 pts) If (π1 , π2 , π3 ) = (0.25, 0.50, 0.25) and n = 3, compute the multinomial probability that (n1 , n2 , n3 ) = (0, 2, 1).

Answer:

 * 0.250 * 0.502 * 0.251  =  * 0.25 * 0.25 = 0.1875

(c) (2 pts) If (π1 , π2 , π3 ) = (0.25, 0.50, 0.25) and n = 3, compute P (n3  = 1).

Answer:

n3  ~ Binomial(3, 0.25) so

P (n3  = 1) = ╱  \1(3) * 0.251 * 0.752  = 3 * 0.25 * 0.752  = 0.753  = 0.421875 0.422

Alternative method:

dbinom(1 ,  size=3 , prob=0.25)

##  [1]  0 .421875

(d)  (2  pts)  If  (n1 , n2 , n3 )   =   (0, 2, 1),  compute  the  log  likelihood  for  (π1 , π2 , π3 )  = (0.25, 0.50, 0.25) using only the kernel of the likelihood.

Answer:

Ignoring the multiplicative constant in the multinomial probability density, the kernel of the likelihood is

y(T) = π1(n)1 π2(n)2 π3(n)3

so the log-likelihood has the form

L(T) = n1 log(π1 ) + n2 log(π2 ) + n3 log(π3 ).

Therefore,

L({0.25, 0.50, 0.25}) = 0 * log(0.25) + 2 * log(0.50) + 1 * log(0.25) = 2 * log(0.50) + log(0.25)

2*log(0.50)+log(0.25)

##  [1]  -2 .772589

(e) (2 pts) If (n1 , n2 , n3 ) = (0, 2, 1), compute (i) the sample proportions pj  = nj /n, j = 1, 2, 3, and (ii) the log likelihood ratio statistic 一2(L({0.25, 0.50, 0.25}) L({p1 , p2 , p3 })).

Answer:

(i)  (p1 , p2 , p3 ) = (0, 2/3, 1/3)

(ii)  L({p1 , p2 , p3 }) = 2 * log(2/3) + log(1/3)

Lp  =  2*log(2/3)+log(1/3)    #  log - likelihood  for  sample  proportions

L0  =  -2.772589                         #  From  part   (d)

-2* (L0  -  Lp)

##  [1]  1 .726093

Problem 2 (10 pts)

For diagnostic testing, let X = true status (1 = disease, 2 = no disease), and let Y = diagnosis (1 = positive, 2 = negative). Let Tij  = P (X = i, Y = j) for i = 1, 2 and j = 1, 2. Also recall our notation that Ti+  = P (X = i) and T+j  = P (Y = j).

(a) (2 pts) The sensitivity of a diagnostic test is the conditional probability that the test is positive given true status = disease. Give a formula for the sensitivity in terms of the Tij .

Answer:

Sensitivity = P (Y = 1|X = 1) =  =

(b) (2 pts) The specificity of a diagnostic test is the conditional probability that the test is negative given true status = no disease. Give the formula for specificity in terms of the Tij .

Answer:

Specicity = P (Y = 2|X = 2) =  =

(c)  (2 pts) Suppose the probability of disease is P (X  =  1)  = 0.01.   Also suppose the sensitivity = 0.86, and the specificity = 0.88.   Determine all the cell probabilities and marginal probabilities for the following table:

Y = 2

T1+

T2+

T+2

Answer:

We are given

 = 0.86

so T11  = (0.01)(0.86) = 0.0086, p12  = T1+ T11  = 0.01 0.0086 = 0.0014, and T2+  = 1 T1+  = 0.99. We are also given

 = 0.88

so T22  = (0.99)(0.88) = 0.8712 and pi21  = T2+ T22  = 0.99 0.8712 = 0.1188.

Finally, T+1  = T11 + T21  = 0.0086 + 0.1188 = 0.1274 and T+2  = 1 T+1 = 1 0.1274 = 0.8726. Summarizing, we have the table:

Y = 1    Y = 2

0.0086

0.0014

0.1188

0.8712

0.1274   0.8726

(d) (2 pts) Based on the numbers in (c), compute the conditional probability of disease given a positive test. Why is this number so small even though the sensitivity is pretty high?

Answer:

P (X = 1|Y = 1) =  =  = 0.0675

The low prevalence (probability) of disease implies that false positive tests from those without the disease are much more common than the disease itself.

(e) (2 pts) Based on the numbers in (c), compute the conditional probability of no disease given a negative test.

Answer:

P (X = 2|Y = 2) =  =  = 0.998