Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH6174: Exercise Sheet 9

1.  Rejection sampling. Assume that X1 , . . . ,Xn are independent identically distributed N(θ, 1) observations. Suppose that the prior distribution for θ is Cauchy with den- sity

π(θ) = 1 −∞ < θ < ∞ .

Derive, up to a constant of proportionality, the posterior density of θ . Suppose that n = 10 and = 1.5. Write a short R function to implement rejection sampling for this density, using the prior density as g(θ). Plot your sample as a histogram.       [You will need to find c such that π(θ|x)/g(θ) ≤ c.]

2.  Suppose that x1 , . . . ,xn  are i.i.d. observations from a Bernoulli distribution with mean θ .

(i) A logistic normal prior distribution is proposed for θ (a normal distribution for log ). Show that if the prior mean and variance for log are 0 and 1 respectively then the prior density function for θ is

π(θ)   = exp ( (log ) 2 )

(ii) As this prior distribution is not conjugate, the Bayes estimator E(θ|x1 , . . . ,xn ) is not directly available.  It is proposed to estimate it using a Monte Carlo sample generated by the Metropolis-Hastings method. One possible algorithm involves generating proposals from the prior distribution, independently of the current observation.  If n = 10, xi  = 8 and the current observation is θ = 0.5, generate the next two observations using this algorithm.

Standard normal random numbers: 0.6962, 1.0821, 0.9379.

Uniform random numbers: 0.5774, 0.9956, 0.6006.

3.  [R question] Write R code to generate θ samples when

(i) θ ∼ N(5, 1.52 )        (ii) θ ∼ Gamma(0.5, 1)

using a Metropolis (random walk) algorithm with a N(θ(t),σ 2 ) proposal distribu- tion. Investigate the choice of proposal scaling σ in both cases, and compare your MCMC samples to draws from rnorm and rgamma.

4.  [R question] Consider the following famous example on genetic linkage (Rao, 1965,

pp.368-369)1 . A particular genetic model gives the probabilities for four genotypes AB, Ab, aB and ab as (2 + θ), (1 − θ), (1 − θ) and θ, where 0 < θ < 1 is the square of the recombination fraction, and is the parameter we wish to estimate.


This is a multinomial distribution with four cells, and likelihood function

f(x|θ) [ (2 + θ)]x1   [ (1 θ)]x2   [ (1 θ)]x3   [ θ]x4   ,

where x =  (x1 , . . . .x4 ) are the observed frequencies of the four genotypes.   We observe the frequencies x1  = 125, x2  = 18, x3  = 20, x4  = 34.

(i) Write an R function to generate from the posterior distribution of θ, given the non-informative prior π(θ) ∝ 1, using a random walk Metropolis algo- rithm. Use your function to explore different choices of random walk standard deviation, and to produce trace plots and autocorrelation plots for θ .

We can also set this example up as missing data problem, and use the Gibbs sampler (see below) to produce a sample from the posterior distribution of θ . Consider splitting one of the cells in two, to produce a multinomial with five cells.   We will split the first genotype,  AB, into two subgroups,  AB1 and AB2, with cell probabilities and .  We do not observe the split between AB1 and AB2.  Let z denote the number of observations in the AB2 cell.  Then, given a uniform prior density for θ, the posterior density conditional on z would be

π(θ|x,z)   ∝ f(x|θ,z)π(θ)

=    [  ] 125z  [  ] z  [ (1 θ)]18  [ (1 θ)]20  [  ] 34  .

(ii) What is the conditional distribution of θ|z, x?

(iii)  Give an argument to show that the conditional distribution of z|θ,x is Binomial(125,).

We can generate a sample from a joint distribution π(θ1 , . . . ,θp |x) by sampling from the conditional distributions for each parameter, given the last values of the other parameters, i.e.

θ t) π(θ1 |xt 1) )

θt) π(θ2 |xt),θt 1) )

θt) π(θp |x) .

MATH6174: Exercise Sheet 9 Solutions

1.  Rejection sampling is a general-purpose algorithm for simulating random draws from a given probability distribution.  The distribution need not have a standard, familiar form. We will describe the algorithm in terms of producing a independent sample from the posterior distribution π(θ|x). The first step is to identify another probability density g(θ) such that

• it is easy to simulate draws from g(θ) (e.g. using the r* functions in R)

• the density g(θ) resembles the posterior density, π(θ|x), of interest in terms of location and spread

• for all θ and a constant c, π(θ|x) ≤ cg(θ).

The final point above implies that π(θ|x)/g(θ) ≤ c. If we are able to find a density g(θ) with these properties, then draws from π(θ|x) are obtained by the following accept/reject algorithm:

(1)  Simulate independently θ from g(θ) and a uniform random variate U from (0,1).

(2) If U  ≤ π(θ|x)/cg(θ), then accept θ as a draw from the posterior density; otherwise, reject θ .

(3)  Continue steps (1) and (2) until the required number of accepted’ θ draws have been collected.

Note that if c > supπ(θ|x)/g(θ), then the algorithm will be less efficient. The posterior, up to a normalizing constant, is given by

π(θ|x) exp {)2 } .

In the rejection method, g(θ) = . Therefore, to find c

π|x)

<θ<g(θ)

= exp {)2 }

=   1 ,

with the supremum achieved at θ = .

Your R function may look a bit like this:

rs .cauchy<-function(samplesize,xbar,n)

{

m<-0

theta<-NULL

while(m

{

temp<-rcauchy(1)

u<-runif(1)

if(u

{

theta[m+1]<-temp

m<-m+1

}

}

return(theta)

}

theta<-rs .cauchy(1000,1 .5,10)

hist(theta,prob=T,xlab=expression(theta),main="")

2.5

θ

2.    (i) Let ϕ = log . A priori, ϕ ∼ N(0, 1) with π(ϕ) = exp { } .

=    +              ⇒  dϕ =              dθ .

Therefore

π(θ) = exp { (log )2 } .

(ii)  Current value is θt . Generate proposal θ using

g(θ) = exp {( log )2 } .

Accept with probability

min {1, }   =   min {1, }

=   min {1, }        as π(θ) = g(θ)                        =   min {1, θ(θ)t(∗)8(8)1(1) θ(θ)t(∗)2(2) }        as f(x|θ) θ 8 (1 θ)2 .

Metropolis-Hastings algorithm:

t = 0 θ 0  = 0.5.

Generate proposal θ from prior log ∼ N(0, 1).  Using the first stan-

dard normal random number, log =  −0.6962,  and therefore θ =

Accept with probability ( ) 8  ( ) 2  = 0.0684.

Generate uniform random number 0.5774 0.0684, so reject the proposal, θ 1  = θ 0  = 0.5.

t = 1 θ 1  = 0.5.

Generate proposal θ from prior log ∼ N(0, 1).   Using the second standard normal random number, log = 1.0821, and therefore θ = = 0.7469.

Accept with probability  ( ) 8  ( ) 2   =  6.353.   As 6.353  >  1, we accept this proposal.

Therefore, θ 2  = θ = 0.7469.

3.    (i) MCMC can be used to generate samples from arbitrary distributions, not just posteriors. Here,

π(θ) exp {5)2 } .

For a random walk algorithm with a Normal proposal distribution, the accep- tance probability is given by

α = min {5)2 } ) .

Your R function might look like this:

norm .mcmc<-function(m,sigma,theta0)

{

accept<-0

theta<-NULL

theta[1]<-theta0

for(i  in  1:(m-1))

{

prop<-rnorm(1,theta[i],sigma)

ratio<-exp(- (prop-5)^2/(2*1 .5^2))/exp(- (theta[i]-5)^2/(2*1 .5^2)) alpha<-min(1,ratio)

if(runif(1)

{

theta[i+1]<-prop

accept<-accept+1

}

else

{

theta[i+1]<-theta[i]

}

}

return(list(theta=theta,accept=accept/m))

}

The function returns the MCMC sample in theta, and the acceptance ratio in accept.

Trying different values of the proposal scaling, σ, I needed to use σ ≥ 6 to get an acceptance ratio of between 0.1 and 0.4. With a starting value of θ (0)  = 0, a small proposal scaling led to too heavy a left tail of the distribution and, sometimes, the mean biased downwards. You can use the function density to estimate the density

from our MCMC sample, and also for a sample from rnorm.

For σ = 1:

>  theta .mcmc<-norm .mcmc(1000,1,0)

> plot(density(theta .mcmc$theta))

>  theta .rnorm<-rnorm(1000,5,1 .5)

>  lines(density(theta .rnorm),lty=2)

>  theta .mcmc$accept

[1]  0 .783

Kernel density estimate

10

theta

For σ = 6:

>  theta .mcmc<-norm .mcmc(1000,1,0)

> plot(density(theta .mcmc$theta))

>  theta .rnorm<-rnorm(1000,5,1 .5)

>  lines(density(theta .rnorm),lty=2)

>  theta .mcmc$accept

[1]  0 .29

Kernel density estimate


10

theta

(ii) Here,

π(θ) θ 11e0.5θ

=   e 0.5θ ,    θ > 0 .

For a random walk algorithm with a Normal proposal distribution, the acceptance probability is given by

α = min {1, } ,    for θ> 0 .

Your R function might look like this:

gamma .mcmc<-function(m,sigma,theta0)

{

accept<-0

theta<-NULL

theta[1]<-theta0

for(i  in  1:(m-1))

{

prop<-rnorm(1,theta[i],sigma)

if(prop<0)

{

ratio<-0

}

else

{

ratio<-exp(-0 .5*prop)/exp(-0 .5*theta[i])

}

alpha<-min(1,ratio)

if(runif(1)

{

theta[i+1]<-prop

accept<-accept+1

}

else

{

theta[i+1]<-theta[i]

}

}

return(list(theta=theta,accept=accept/m))

}

This is only a small change to the last function - just changing the acceptance probability, remembering that π(θ) = 0 for θ ≤ 0.  Again, I needed large σ (=5) to get an acceptance ratio between 0.1 and 0.4. We can compare to a sample from rgamma by plotting the density estimates.

>  theta .mcmc<-gamma .mcmc(1000,1,0)

> plot(density(theta .mcmc$theta),main="Kernel  density  estimate",xlab="theta") >  theta .rgamma<-rgamma(1000,1,0 .5)

>  lines(density(theta .rgamma),lty=2)

>  legend("topright",legend=c("MCMC","rgamma"),lty=1:2)

>  theta .mcmc$accept

[1]  0 .746

Kernel density estimate

MCMC

rgamma

15          20

theta

>  theta .mcmc<-gamma .mcmc(1000,5,0)

> plot(density(theta .mcmc$theta),main="Kernel  density  estimate",xlab="theta") >  theta .rgamma<-rgamma(1000,1,0 .5)

>  lines(density(theta .rgamma),lty=2)

>  legend("topright",legend=c("MCMC","rgamma"),lty=1:2)

>  theta .mcmc$accept

[1]  0 .281

Kernel density estimate

MCMC

rgamma

8         10        12

theta

These examples are illustrative, and your mileage may vary.  I suggest generating lots of chains of length m = 1000 and seeing how they vary. If you try m = 10000, you should see much less variability.

4.    (i) Using a random walk algorithm and a non-informative prior distribution, the acceptance probability is just the ratio of the likelihoods:

α = min {1, ,12(1)5(2)51414(11(1)8(8)  ‘(‘) 1(1) 20(0)1434 } ,    0 < θ < 1 .

Your R code might look a bit like this:

genetic .metropolis<-function(m,sigma,theta0)

{

accept<-0

theta<-NULL

theta[1]<-theta0

for(i  in  1:(m-1))

{

prop<-rnorm(1,theta[i],sigma)

if(prop<0 | | prop>1)

{

ratio<-0

}

else

{

lp<- (0 .25*(2+prop)^125)*(0 .25*(1-prop)^18)*(0 .25*(1-prop)^20)*(0 .25*prop^34) lt<- (0 .25*(2+theta[i])^125)*(0 .25*(1-theta[i])^18)*(0 .25*(1-theta[i])^20) *(0 .25*theta[i]^34)

ratio<-lp/lt

}

alpha<-min(1,ratio)

if(runif(1)

{

theta[i+1]<-prop

accept<-accept+1

}

else

{

theta[i+1]<-theta[i]

}

}

return(list(theta=theta,accept=accept/m))

}

This function is very similar to those in question 2. Notice how we restrict to accepting moves only in 0 < θ< 1, the range of π(θ|x). Once again, we can investigate the choice of the proposal scaling σ - I had to use a value of around 0.2-0.3 to achieve an acceptance ratio of about 0.2.

>  theta .met<-genetic .metropolis(10000,0 .2,0)

>  theta .met$accept

[1]  0 .2992

> plot(density(theta .met$theta),main="Kernel  density  estimate",xlab="theta") >  summary(theta .met$theta)

Min .  1st  Qu .   Median       Mean  3rd  Qu .       Max .

0.0000    0.5883    0.6236    0.6213    0.6578    0.7853

Kernel density estimate

0.0             0.2             0.4             0.6             0.8

theta

The long left hand tail of the density plot is caused by my (poor) choice of starting value θ (0)  = 0. We can also check this by looking at the trace plot:

> par(mfrow=c(2,1))

> plot(1:10000,theta .met$theta,type="l",xlab="iteration",ylab="theta",main=" >  acf(theta .met$theta,ylab="autocorrelation",main="Autocorrelation plot")

Trace plot


0          2000       4000       6000       8000      10000

iteration

Autocorrelation plot

20

Lag

The trace plot shows how the chain quickly moves away from the starting value, and then settles - exploring the region around the posterior mode. We may want to discard the first few ( 100) observations as burn-in. The chain is mixing well, not getting stuck in any local regions of the parameter space. The function acf estimate and plots the autocorrelation function - the correlation between values a particular number of iterations apart (“lag”). This plots looks OK - the correlation quickly drops to zero by lag 10; that is, observations in the chain 10 iterations apart are independent.

For interest, here are bad trace and autocorrelation plots, for proposal scaling σ = 10.  We are accepting far too few moves, leading to the chain becoming stuck in local regions of the parameter space and not mixing properly, and the correlations remaining high.

Trace plot

0          2000       4000       6000       8000      10000

iteration

Autocorrelation plot

20

Lag

(ii) We have

π(θ|x,z) θ z+34(1 θ)38 ,

Hence, θ|x,z ∼ Beta(z + 35, 39).

(iii)  Given x and θ, there are 125 observations in cells 1 and 2 (AB1 and AB2). Hence, z is the number of successes”  (AB2s) out of 125.   Given that the observation is in either cell 1 or 2, the probability of being in cell 2 is

θ

+ θ + 2 .

(iv)  Gibbs sampling uses the conditional distributions, sampling from each using the the most recent values for the other parameter, i.e.:

z (t) Binomial ( 125, )

θ (t) Beta(z(t) + 35, 39) .

Hence, we no longer need a proposal distribution, or to calculate an acceptance probability (every move is accepted). The R code is therefore very straightfor- ward:

genetic .gibbs<-function(m,theta0,z0)

{

theta<-NULL

z<-NULL

theta[1]<-theta0

z[1]<-z0

for(i  in  1:(m-1))

{

z[i+1]<-rbinom(1,125,(theta[i]/(theta[i]+2)))

theta[i+1]<-rbeta(1,z[i+1]+35,39)

}

return(list(theta=theta,z=z))

}

We can produce density plots and summary statistics for θ and z:


>  theta .gibbs<-genetic .gibbs(10000,0 .5,62)

> par(mfrow=c(1,2))

> plot(density(theta .gibbs$theta),main="Kernel  density  estimate  -  theta",xlab="t > plot(density(theta .gibbs$z),main="Kernel  density  estimate  -  z",xlab="z")           >  summary(theta .gibbs$theta)

Min .  1st  Qu .   Median       Mean