Bayesian Statistics Level M 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Wednesday 4th May 2022
EXAMINATION FOR THE DEGREES OF M.SCI. , M.RES. AND M.SC.
(SCIENCE)
Bayesian Statistics Level M
1. Consider yi to be observed counts coming from the Poisson distribution. Consider the following model with parameter λ:
p(yi Iλ) =
λ > 0, yi e N0
(a) Write down the likelihood function appropriate for n i.i.d. observations y1 , . . . , yn .
[2 MARKS]
(b) Explain the Bayesian concept of a conjugate prior. Show that the Ga(α, β) dis- tribution is conjugate for i.i.d. data y1 , . . . , yn using the likelihood derived in part (1a). [2 MARKS]
(c) Derive the Jeffreys’ prior p(λ) for the parameter λ in the Poisson model for n
observations above.
(d) Is the obtained Jeffreys’ prior for λ proper? Demonstrate this.
[4 MARKS]
[2 MARKS]
(e) Show that the posterior distribution resulting from using the Jeffreys’ prior is equivalent to Ga ( yi + 1/2, n) , and demonstrate whether it is proper.
[2 MARKS]
(f) Assume that y1 , . . . , yn have been observed. Using the Jeffreys’ prior, derive the
posterior predictive distribution for yn+1 . [6 MARKS]
(g) Using a Gamma(2, 2) prior on λ, and having observed counts 3, 2, 1, and 2; what
is the maximum a posteriori (MAP) estimate of λ? (The mode of a Gamma
distribution Ga(α, β) is ) . [2 MARKS].
2. Suppose you perform ten independent Bernoulli trials and observe six successes. Let θ denote the probability of success on each trial.
(a) Using the Be(1, 1) prior, and after observing the data, 6 successes in 10 trials,
what probability would you attach to the event that the eleventh trial will result in a success? [4 MARKS]
(b) You may believe that Beta priors are not flexible enough and prefer to specify a prior on θ by means of a normal prior on the logits:
θ
1 - θ
You want the induced prior on θ to satisfy the following conditions:
Pr [θ > 0.5] = 1/2 and Pr [θ < 0.75] = 0.975
Determine the values of µ and σ 2 required. [6 MARKS]
3. Consider the following model
yij Iµi ~ N(µi , σy(2)), i = 1, . . . , n, j = 1, . . . , ni , independently; µi = µ + bi ,
bi ~ N(0, σb(2)), i = 1, . . . , n, independently;
where µ , σy(2), and σb(2) are fixed constants.
(a) Derive the joint probability density function of the {yij } and the {bi } under the
model p(b, y). [3 MARKS]
(b) Find the full conditional distributions of bi . These should be known distributions.
[6 MARKS]
(c) Explain how you can sample from p(bIy). [2 MARKS]
(d) Suppose that a sample b(t) , t = 1, . . . , T, from the posterior distribution of b is available. Explain how you can use it to compute an estimate of the posterior predictive distribution of a future observation y˜ . [3 MARKS]
4. Eight different training methods for a particular exam have been implemented in classes in different schools. In some preliminary data analysis, the overall effect yj on the exam result in the jth school (where training method j was used; j = 1, . . . , 8) has been estimated along with an estimate of its accuracy (reflected in the estimated standard error, σj ), as reported in this table:
School j 1 2 3 4 5 6 7 8
Effect of training yj 28 8 -3 7 -1 1 18 12 Standard error σj 15 10 16 11 9 11 10 18
The higher values of the effect correspond to better training performance. To explore the effects of the different training methods in general, a Bayesian normal hierarchical model is formulated and fitted in WinBUGS with the following model code:
model
{
for (j in 1:8) {
y[j] ~ dnorm(theta[j],inv .sig2[j])
theta[j] ~ dnorm(mu, inv .tau2)
inv .sig2[j] <- 1/ (sigma[j] * sigma[j])
}
mu ~ dnorm (0, 1 .0E-6)
inv .tau2 <-1/(tau*tau)
tau ~ dunif(0,100)
}
(Here y[j] corresponds to yj and sigma[j] to σj ).
(a) Convert the WinBUGS model specification back into standard statistical notation,
taking care to include distributions that fully characterise the likelihood and the prior distribution, and to state what distributions are independent of each other. Remember that WinBUGS function dnorm uses normal distribution precision as the
second
parameter.
[4 MARKS]
(b) The table below presents posterior summaries produced by WinBUGS:
node mean sd MC error 2 .5% median 97 .5% start sample theta[1] 11 .25 8 .221 0 .09032 -2 .198 10 .07 31 .12 501 99500 theta[2] 7 .835 6 .218 0 .06468 -4 .588 7 .747 20 .44 501 99500 theta[3] 6 .106 7 .692 0 .06912 -11 .31 6 .633 20 .48 501 99500 theta[4] 7 .588 6 .490 0 .06546 -5 .536 7 .593 20 .64 501 99500 theta[5] 5 .112 6 .344 0 .06719 -8 .935 5 .663 16 .54 501 99500 theta[6] 6 .106 6 .672 0 .06461 -8 .437 6 .497 18 .60 501 99500 theta[7] 10 .59 6 .759 0 .07920 -1 .385 9 .906 25 .86 501 99500 theta[8] 8 .377 7 .778 0 .07115 -6 .737 8 .068 25 .39 501 99500 mu 7 .862 5 .149 0 .06598 -2 .196 7 .784 18 .10 501 99500
tau 6 .478 5 .576 0 .08033 0 .2771 5 .140 20 .50 501 99500
In the remainder of this question, the symbols θi , µ and τ will be used to represent the variables called theta[i], mu and tau, respectively, in WinBUGS. For each of the posterior summaries listed below, state whether it can be determined using the WinBUGS output above and why. If your answer is “yes”, compute the estimate; otherwise explain how you would compute it in WinBUGS.
i. The central 95% posterior interval for θ3 . [2 MARKS]
ii. The posterior mean of θ 1 - θ5 . [2 MARKS]
(c) Explain how you would estimate the posterior probability that the effect θ3 of training method 3 is better than that of training method 4. [Hint: You may need to modify the WinBUGS code. The step() function in WinBUGS takes the value 1 if its argument is positive and 0 otherwise.] [2 MARKS]
5. A random sample of size 5 is taken from a Poisson distribution with mean λ . The prior for λ is taken as
p(λ) = 0.2 . Gamma(28, 4) + 0.8 . Gamma(9, 3)
(a) Calculate the prior mean and variance [2 MARKS]
(b) If i(5)=1 yi = 29, show that the posterior distribution of λ can be written as
0.595961 . Gamma(57, 9) + 0.404039 . Gamma(38, 8), and calculate its mean.
[6 MARKS]
6. It is common to report the results of an experiment rounded to certain precision. Consider the following experiment: we measure time in minutes between customers entering a certain shop. The number of minutes between customers can be modelled by an exponential distribution:
yi i. . Exp(θ)
where yi are the observed time intervals, and θ is the rate of the exponential distri- bution, E [y] = 1/θ . We will infer θ from observed yi . A non-informative improper Gamma prior is assumed for θ:
θ ~ Ga(1, 0).
We record two waiting times between customers as y˜1 = 2 and y˜2 = 3 minutes, rounded to a minute.
(a) Derive the posterior distribution of θ considering observed waiting times as exact
(not rounded, i.e. y1 = y˜1 , y2 = y˜2 ) . What is the posterior mean waiting time?
[2 MARKS]
(b) Derive the posterior distribution of θ taking rounding into account. This means
that actual values could be 1.5 < y1 < 2.5 and 2.5 < y2 < 3.5. What is the posterior mean waiting time in this case? [5,3 MARKS]
7. Consider a problem of estimating the rate parameter λ of the exponential model with the value (‘action’) a:
yi ~ Exp(λ), i = 1, . . . , n, independently
(a) Using a Gamma prior λ ~ Gamma(1, 1), after observing y1 = 0.7, y2 = 1, y3 =
0.8, y4 = 1.5, and using the following loss function:
a a
λ λ
Show that the Bayes action aT satisfies = E [ Iy].
[Hint: E [f (x)Iy] = Ω f (x)p(xIy)dx .] [6 MARKS].
(b) Calculate the above Bayes action aT after observing the following data: y =
(0.7, 1, 0.8, 1.5), your answer should be a number. [2 MARKS]
Total: 80 MARKS
2022-07-07