MATH6174: Exercise Sheet 9
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH6174: Exercise Sheet 9
1. Rejection sampling. Assume that X1 , . . . ,Xn are independent identically distributed N(θ, 1) observations. Suppose that the prior distribution for θ is Cauchy with den- sity
π(θ) = 1 −∞ < θ < ∞ .
Derive, up to a constant of proportionality, the posterior density of θ . Suppose that n = 10 and = 1.5. Write a short R function to implement rejection sampling for this density, using the prior density as g(θ). Plot your sample as a histogram. [You will need to find c such that π(θ|x)/g(θ) ≤ c.]
2. Suppose that x1 , . . . ,xn are i.i.d. observations from a Bernoulli distribution with mean θ .
(i) A logistic normal prior distribution is proposed for θ (a normal distribution for log ). Show that if the prior mean and variance for log are 0 and 1 respectively then the prior density function for θ is
π(θ) = exp ( − (log ) 2 )
(ii) As this prior distribution is not conjugate, the Bayes estimator E(θ|x1 , . . . ,xn ) is not directly available. It is proposed to estimate it using a Monte Carlo sample generated by the Metropolis-Hastings method. One possible algorithm involves generating proposals from the prior distribution, independently of the current observation. If n = 10, 之 xi = 8 and the current observation is θ = 0.5, generate the next two observations using this algorithm.
Standard normal random numbers: −0.6962, 1.0821, 0.9379.
Uniform random numbers: 0.5774, 0.9956, 0.6006.
3. [R question] Write R code to generate θ samples when
(i) θ ∼ N(5, 1.52 ) (ii) θ ∼ Gamma(0.5, 1)
using a Metropolis (random walk) algorithm with a N(θ(t),σ 2 ) proposal distribu- tion. Investigate the choice of proposal scaling σ in both cases, and compare your MCMC samples to draws from rnorm and rgamma.
4. [R question] Consider the following famous example on genetic linkage (Rao, 1965,
pp.368-369)1 . A particular genetic model gives the probabilities for four genotypes AB, Ab, aB and ab as (2 + θ), (1 − θ), (1 − θ) and θ, where 0 < θ < 1 is the square of the recombination fraction, and is the parameter we wish to estimate.
This is a multinomial distribution with four cells, and likelihood function
f(x|θ) ∝ [ (2 + θ)]x1 [ (1 − θ)]x2 [ (1 − θ)]x3 [ θ]x4 ,
where x = (x1 , . . . .x4 ) are the observed frequencies of the four genotypes. We observe the frequencies x1 = 125, x2 = 18, x3 = 20, x4 = 34.
(i) Write an R function to generate from the posterior distribution of θ, given the non-informative prior π(θ) ∝ 1, using a random walk Metropolis algo- rithm. Use your function to explore different choices of random walk standard deviation, and to produce trace plots and autocorrelation plots for θ .
We can also set this example up as missing data problem, and use the Gibbs sampler (see below) to produce a sample from the posterior distribution of θ . Consider splitting one of the cells in two, to produce a multinomial with five cells. We will split the first genotype, AB, into two subgroups, AB1 and AB2, with cell probabilities and . We do not observe the split between AB1 and AB2. Let z denote the number of observations in the AB2 cell. Then, given a uniform prior density for θ, the posterior density conditional on z would be
π(θ|x,z) ∝ f(x|θ,z)π(θ)
= [ ] 125−z [ ] z [ (1 − θ)]18 [ (1 − θ)]20 [ ] 34 .
(ii) What is the conditional distribution of θ|z, x?
(iii) Give an argument to show that the conditional distribution of z|θ,x is Binomial(125,).
We can generate a sample from a joint distribution π(θ1 , . . . ,θp |x) by sampling from the conditional distributions for each parameter, given the last values of the other parameters, i.e.
θ t) ∼ π(θ1 |x,θ ,θt − 1) )
θt) ∼ π(θ2 |x,θt),θ ,θt − 1) )
θt) ∼ π(θp |x,θ ,θ) .
MATH6174: Exercise Sheet 9 Solutions
1. Rejection sampling is a general-purpose algorithm for simulating random draws from a given probability distribution. The distribution need not have a standard, familiar form. We will describe the algorithm in terms of producing a independent sample from the posterior distribution π(θ|x). The first step is to identify another probability density g(θ) such that
• it is easy to simulate draws from g(θ) (e.g. using the r* functions in R)
• the density g(θ) resembles the posterior density, π(θ|x), of interest in terms of location and spread
• for all θ and a constant c, π(θ|x) ≤ cg(θ).
The final point above implies that π(θ|x)/g(θ) ≤ c. If we are able to find a density g(θ) with these properties, then draws from π(θ|x) are obtained by the following accept/reject algorithm:
(1) Simulate independently θ from g(θ) and a uniform random variate U from (0,1).
(2) If U ≤ π(θ|x)/cg(θ), then accept θ as a draw from the posterior density; otherwise, reject θ .
(3) Continue steps (1) and (2) until the required number of ‘accepted’ θ draws have been collected.
Note that if c > supπ(θ|x)/g(θ), then the algorithm will be less efficient. The posterior, up to a normalizing constant, is given by
π(θ|x) ∝ exp {− (θ − )2 } .
In the rejection method, g(θ) = . Therefore, to find c
π(θ|x)
−∞ <θ<∞ g(θ)
= −∞ exp {− (θ − )2 }
= 1 ,
with the supremum achieved at θ = .
Your R function may look a bit like this:
rs .cauchy<-function(samplesize,xbar,n)
{
m<-0
theta<-NULL
while(m
{
temp<-rcauchy(1)
u<-runif(1)
if(u
{
theta[m+1]<-temp
m<-m+1
}
}
return(theta)
}
theta<-rs .cauchy(1000,1 .5,10)
hist(theta,prob=T,xlab=expression(theta),main="")
2.5
θ
2. (i) Let ϕ = log . A priori, ϕ ∼ N(0, 1) with π(ϕ) = exp {− } .
= + ⇒ dϕ = dθ .
Therefore
π(θ) = exp {− (log )2 } .
(ii) Current value is θt . Generate proposal θ ∗ using
g(θ∗ ) = exp {−( log )2 } .
Accept with probability
min {1, } = min {1, }
= min {1, } as π(θ) = g(θ) = min {1, θ(θ)t(∗)8(8)1(1) θ(θ)t(∗)2(2) } as f(x|θ) ∝ θ 8 (1 − θ)2 .
Metropolis-Hastings algorithm:
t = 0 θ 0 = 0.5.
Generate proposal θ ∗ from prior log ∼ N(0, 1). Using the first stan-
dard normal random number, log = −0.6962, and therefore θ ∗ =
Accept with probability ( ) 8 ( ) 2 = 0.0684.
Generate uniform random number 0.5774 0.0684, so reject the proposal, θ 1 = θ 0 = 0.5.
t = 1 θ 1 = 0.5.
Generate proposal θ ∗ from prior log ∼ N(0, 1). Using the second standard normal random number, log = 1.0821, and therefore θ ∗ = = 0.7469.
Accept with probability ( ) 8 ( ) 2 = 6.353. As 6.353 > 1, we accept this proposal.
Therefore, θ 2 = θ ∗ = 0.7469.
3. (i) MCMC can be used to generate samples from arbitrary distributions, not just posteriors. Here,
π(θ) ∝ exp {− (θ − 5)2 } .
For a random walk algorithm with a Normal proposal distribution, the accep- tance probability is given by
α = min〈 {− (θ⋆ − 5)2 } ) .
Your R function might look like this:
norm .mcmc<-function(m,sigma,theta0)
{
accept<-0
theta<-NULL
theta[1]<-theta0
for(i in 1:(m-1))
{
prop<-rnorm(1,theta[i],sigma)
ratio<-exp(- (prop-5)^2/(2*1 .5^2))/exp(- (theta[i]-5)^2/(2*1 .5^2)) alpha<-min(1,ratio)
if(runif(1)
{
theta[i+1]<-prop
accept<-accept+1
}
else
{
theta[i+1]<-theta[i]
}
}
return(list(theta=theta,accept=accept/m))
}
The function returns the MCMC sample in theta, and the acceptance ratio in accept.
Trying different values of the proposal scaling, σ, I needed to use σ ≥ 6 to get an acceptance ratio of between 0.1 and 0.4. With a starting value of θ (0) = 0, a small proposal scaling led to too heavy a left tail of the distribution and, sometimes, the mean biased downwards. You can use the function density to estimate the density
from our MCMC sample, and also for a sample from rnorm.
For σ = 1:
> theta .mcmc<-norm .mcmc(1000,1,0)
> plot(density(theta .mcmc$theta))
> theta .rnorm<-rnorm(1000,5,1 .5)
> lines(density(theta .rnorm),lty=2)
> theta .mcmc$accept
[1] 0 .783
Kernel density estimate
10
theta
For σ = 6:
> theta .mcmc<-norm .mcmc(1000,1,0)
> plot(density(theta .mcmc$theta))
> theta .rnorm<-rnorm(1000,5,1 .5)
> lines(density(theta .rnorm),lty=2)
> theta .mcmc$accept
[1] 0 .29
Kernel density estimate
10
theta
(ii) Here,
π(θ) ∝ θ 1−1e−0.5θ
= e −0.5θ , θ > 0 .
For a random walk algorithm with a Normal proposal distribution, the acceptance probability is given by
α = min {1, } , for θ⋆ > 0 .
Your R function might look like this:
gamma .mcmc<-function(m,sigma,theta0)
{
accept<-0
theta<-NULL
theta[1]<-theta0
for(i in 1:(m-1))
{
prop<-rnorm(1,theta[i],sigma)
if(prop<0)
{
ratio<-0
}
else
{
ratio<-exp(-0 .5*prop)/exp(-0 .5*theta[i])
}
alpha<-min(1,ratio)
if(runif(1)
{
theta[i+1]<-prop
accept<-accept+1
}
else
{
theta[i+1]<-theta[i]
}
}
return(list(theta=theta,accept=accept/m))
}
This is only a small change to the last function - just changing the acceptance probability, remembering that π(θ) = 0 for θ ≤ 0. Again, I needed large σ (=5) to get an acceptance ratio between 0.1 and 0.4. We can compare to a sample from rgamma by plotting the density estimates.
> theta .mcmc<-gamma .mcmc(1000,1,0)
> plot(density(theta .mcmc$theta),main="Kernel density estimate",xlab="theta") > theta .rgamma<-rgamma(1000,1,0 .5)
> lines(density(theta .rgamma),lty=2)
> legend("topright",legend=c("MCMC","rgamma"),lty=1:2)
> theta .mcmc$accept
[1] 0 .746
Kernel density estimate
|
rgamma |
|
15 20
theta
> theta .mcmc<-gamma .mcmc(1000,5,0)
> plot(density(theta .mcmc$theta),main="Kernel density estimate",xlab="theta") > theta .rgamma<-rgamma(1000,1,0 .5)
> lines(density(theta .rgamma),lty=2)
> legend("topright",legend=c("MCMC","rgamma"),lty=1:2)
> theta .mcmc$accept
[1] 0 .281
Kernel density estimate
MCMC
rgamma
8 10 12
theta
These examples are illustrative, and your mileage may vary. I suggest generating lots of chains of length m = 1000 and seeing how they vary. If you try m = 10000, you should see much less variability.
4. (i) Using a random walk algorithm and a non-informative prior distribution, the acceptance probability is just the ratio of the likelihoods:
α = min {1, ‘ ,12(1)5(2)5‘‘1414(11(1)8(8) ‘(‘) 1(1) 20(0)‘14‘34 } , 0 < θ < 1 .
Your R code might look a bit like this:
genetic .metropolis<-function(m,sigma,theta0)
{
accept<-0
theta<-NULL
theta[1]<-theta0
for(i in 1:(m-1))
{
prop<-rnorm(1,theta[i],sigma)
if(prop<0 | | prop>1)
{
ratio<-0
}
else
{
lp<- (0 .25*(2+prop)^125)*(0 .25*(1-prop)^18)*(0 .25*(1-prop)^20)*(0 .25*prop^34) lt<- (0 .25*(2+theta[i])^125)*(0 .25*(1-theta[i])^18)*(0 .25*(1-theta[i])^20) *(0 .25*theta[i]^34)
ratio<-lp/lt
}
alpha<-min(1,ratio)
if(runif(1)
{
theta[i+1]<-prop
accept<-accept+1
}
else
{
theta[i+1]<-theta[i]
}
}
return(list(theta=theta,accept=accept/m))
}
This function is very similar to those in question 2. Notice how we restrict to accepting moves only in 0 < θ⋆ < 1, the range of π(θ|x). Once again, we can investigate the choice of the proposal scaling σ - I had to use a value of around 0.2-0.3 to achieve an acceptance ratio of about 0.2.
> theta .met<-genetic .metropolis(10000,0 .2,0)
> theta .met$accept
[1] 0 .2992
> plot(density(theta .met$theta),main="Kernel density estimate",xlab="theta") > summary(theta .met$theta)
Min . 1st Qu . Median Mean 3rd Qu . Max .
0.0000 0.5883 0.6236 0.6213 0.6578 0.7853
Kernel density estimate
|
|
0.0 0.2 0.4 0.6 0.8
theta
The long left hand tail of the density plot is caused by my (poor) choice of starting value θ (0) = 0. We can also check this by looking at the trace plot:
> par(mfrow=c(2,1))
> plot(1:10000,theta .met$theta,type="l",xlab="iteration",ylab="theta",main=" > acf(theta .met$theta,ylab="autocorrelation",main="Autocorrelation plot")
Trace plot
0 2000 4000 6000 8000 10000
iteration
Autocorrelation plot
20
Lag
The trace plot shows how the chain quickly moves away from the starting value, and then settles - exploring the region around the posterior mode. We may want to discard the first few ( 100) observations as burn-in. The chain is mixing well, not getting stuck in any local regions of the parameter space. The function acf estimate and plots the autocorrelation function - the correlation between values a particular number of iterations apart (“lag”). This plots looks OK - the correlation quickly drops to zero by lag 10; that is, observations in the chain 10 iterations apart are independent.
For interest, here are bad trace and autocorrelation plots, for proposal scaling σ = 10. We are accepting far too few moves, leading to the chain becoming stuck in local regions of the parameter space and not mixing properly, and the correlations remaining high.
Trace plot
0 2000 4000 6000 8000 10000
iteration
Autocorrelation plot
20
Lag
(ii) We have
π(θ|x,z) ∝ θ z+34(1 − θ)38 ,
Hence, θ|x,z ∼ Beta(z + 35, 39).
(iii) Given x and θ, there are 125 observations in cells 1 and 2 (AB1 and AB2). Hence, z is the number of “successes” (AB2s) out of 125. Given that the observation is in either cell 1 or 2, the probability of being in cell 2 is
θ
+ θ + 2 .
(iv) Gibbs sampling uses the conditional distributions, sampling from each using the the most recent values for the other parameter, i.e.:
z (t) ∼ Binomial ( 125, )
θ (t) ∼ Beta(z(t) + 35, 39) .
Hence, we no longer need a proposal distribution, or to calculate an acceptance probability (every move is accepted). The R code is therefore very straightfor- ward:
genetic .gibbs<-function(m,theta0,z0)
{
theta<-NULL
z<-NULL
theta[1]<-theta0
z[1]<-z0
for(i in 1:(m-1))
{
z[i+1]<-rbinom(1,125,(theta[i]/(theta[i]+2)))
theta[i+1]<-rbeta(1,z[i+1]+35,39)
}
return(list(theta=theta,z=z))
}
We can produce density plots and summary statistics for θ and z:
> theta .gibbs<-genetic .gibbs(10000,0 .5,62)
> par(mfrow=c(1,2))
> plot(density(theta .gibbs$theta),main="Kernel density estimate - theta",xlab="t > plot(density(theta .gibbs$z),main="Kernel density estimate - z",xlab="z") > summary(theta .gibbs$theta)
Min . 1st Qu . Median Mean
2023-01-13