STAT 7100: Introduction to Advanced Statistical Inference Examples and Illustrations for part B of learning unit 7
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 7100: Introduction to Advanced Statistical Inference
Examples and Illustrations for part B of learning unit 7
Example: Metropolis-Hastings calculation of Binomial counts analysis. Suppose XIθ ~ binomial(n, θ) counts the number of incidences in a binomial experiment with n trials. Prior uncertainty about the incidence probability is described by a beta distribution, P ~ beta(α, β), having written P for a random variable whose possible values are those of θ . The associated data-generating PMF and prior PDF are given by
fx-P(xIθ) = ╱ 、x(n) θ2 (1 - θ)ny2 and fP (θ) = θay] (1 - θ)β y]
Suppose further that x = 3“heads”are observed in n = 10 trials.
We examined this setup in a previous unit, wherein it was noted that the posterior dis- tribution is P IX = x ~ beta(α + x, β + n - x), which makes possible direct calculation of posterior quantiles using software with built-in functions. This calculation provides the following table of posterior quantiles across three settings of the prior parameters α and β .
quantile |
(α, β) (1/2, 1/2) |
2.5% 25% 50% 75% 97.5% mean |
0.0740 0.1949 0.2860 0.3908 0.6012 0.3001 |
An alternative approach to calculating these quantiles is to sample from the posterior distribution using a Metropolis-Hastings algorithm. From the forms of the data-generating PMF, fx-P(xIθ) x θ2 (1 - θ)ny2 , and prior PDF, fP (θ) x θ ay] (1 - θ)β y] , the posterior PDF has the form
fP -x (θIx) x fx-P(xIθ)fP (θ) x θa+2y] (1 - θ)β+ny2y] .
In a Metropolis-Hastings algorithm, the acceptance probability of a proposed update γ of the incidence probability θ is therefore
ρ = min , , 1 ← = min 宁 ╱ 、 a+2y] ╱ 、β+ny2y] , 1 ( ,
where fQ-P(γIθ) is the PDF of the proposal distribution.
In this computational approach, one interesting procedure for generating a proposed up- date γ of θ is defined by the following steps: First, simulate a value u from a uniform distribution between -c and c (i.e., -c < u < c). Next, assign the update according to
) - (θ + u) if θ + u < 0
γ = l θ + u if 0 < θ + u < 1
Interpreting these steps, the procedure works by simulating a symmetric shift of θ, fol- lowed by checking whether the value falls outside the unit interval; if it does, the value is reflected back into the interval by the same distance it fell outside of it. This type of proposal distribution is sometimes called a uniform, symmetric shift with reflecting bound- aries. What is especially valuable about it is that the corresponding conditional density is guaranteed to satisfy fQ-P(γIθ) = fQ-P(θIγ), which implies that the density cancels in the acceptance probability formula, and the formula simplifies to
ρ = min , , 1 ← = min 宁 ╱ 、 a+2y] ╱ 、β+ny2y] , 1 ( .
An accompanying R code demonstrates how this algorithm can reproduce the table of posterior quantiles that appears above.
Example: Weibull sampling. Suppose k = (X] , . . . , Xn ) is an IID random sample such that each Xi has PDF
fx-P(xIξ, λ) = ξλx5 y] ey2s A .
This defines a Weibull distribution with parameters ξ and β = 1/λ. Suppose the prior distribution is formulated to that ξ and λ follow independent gamma distributions,
fP (ξ, λ) = ╱ ξay] eyb5 、 ╱ λcy] eydA 、.
● One possible proposal distribution specifies the generation of a proposed update (ξ* , λ* ) from independent exponential distributions, with means ξ and λ. The PDF is
fQ-P (ξ* , λ* Iξ, λ) = ey85 * /5+A* /A〉,
This proposal does not define a random walk. The associated acceptance probabil- ity formula is somewhat complicated: ρ = min{e( , 1}, where
ζ = (n + a - 2) log(ξ* /ξ) + (n + c - 2) log(λ* /λ) +b(ξ - ξ * ) + d(λ - λ* ) + (ξ* /ξ - ξ/ξ * ) + (λ* /λ - λ/λ* )
n n n
+(ξ* - ξ)_ log xi + λ _xi(5) - λ* _ x
ie] ie] ie]
● Another possible proposal distribution is defined according to log ξ* = log ξ + σ ] ∈ ] and log λ* = log λ + σ2 ∈2
where ∈ ] and ∈2 are independent with symmetric distributions. For instance, if each ∈i uniform(-σi , σi ), then the proposal density is
fQ-P(ξ* , λ* Iξ, λ) =
1 |
4σ] σ2 ξ * λ* |
, for
ξey口1 < ξ * < ξe口1
λey口2 < λ* < λe口2 .
Although elements of this proposal distribution resemble those of a random walk, it does not define one. The complication, here, is that the additive perturbations log ξ* = log ξ + σ∈ ] and log λ* = log λ + σ∈2 are defined on a logarithmic scale, and not directly on the parameters.
To define a random walk, one might re-parameterize the problem according to θ ] = log ξ and θ2 = log λ. The log-likelihood function and prior translate to
n n
l(k; m) = nθ ] + nθ2 + (e91 - 1)_ log xi - e92 _ xi(e)9 1
ie] ie]
and
fP (k) = ╱ ea91 ybe9 1 、 ╱ ec92 yde92 、,
having used the transformation theorem to identify the PDF of the re-parameterized prior.
Under this parameterization, the perturbations γi = θi + σi ∈i define a random-walk
proposal distribution. The acceptance probability is ρ = min ,e( , 1、, where
ζ = (n + a)(γ] - θ ] ) + (n + c)(γ2 - θ2 )
-(ey1 - e91) !n_ie] log xi + b! - ey2 n_ie] xi(e)y 1 + e92 n_ie] x
Example: Beta-binomial distributions. Suppose the random variable P follows a beta- binomial distribution with parameters n, α, and β. Recall that this distribution is defined by the PDF
fP (θ) = ╱ 、θ(n) .
Despite that this PDF is written in a concise form, it can be difficult to work with, especially when n, α, and β are large; thus, Monte Carlo solutions are appealing.
To simulate a beta-binomial distribution, it is sometimes helpful to recall the hierarchical formulation by which P IH = λ ~ binomial(n, λ) and H ~ beta(α, β) implies the beta- binomial distribution above for the marginal distribution of P . Thus, it becomes possible to generate a sample of beta-binomial values, (θ8]〉, . . . , θ 8k〉), by generating a sample of bivariate measurements, ((θ8]〉, λ 8]〉), . . . , (θ8k〉, λ 8k〉)), such that the pair (θ8i〉, λ 8i〉) is gener- ated by first simulating λ 8i〉 from a beta(α, β) distribution and then simulating θ 8i〉 from a binomial(n, λ8i〉) distribution.
Notice that each pair of (θ, λ)-values is simulated independently of all other such pairs in this algorithm, so the Markov chain it defines is trivial. An alternative approach applies the property HIP = θ ~ beta(θ + α, n - θ + β). Here, the sequence ((θ8]〉, λ 8]〉), . . . , (θ8k〉, λ 8k〉)) is simulated as follows.
● (Initialization) Generate (θ8]〉, λ 8]〉) according to the previous algorithm: first simulate λ 8]〉 from a beta(α, β) distribution; then simulate θ8]〉 from a binomial(n, λ8]〉) distribu- tion.
● For i > 1, simulate λ8i〉from a beta(θ8iy]〉+α, n - θ 8iy]〉+β) distribution; then simulate θ 8i〉 from a binomial(n, λ8i〉) distribution.
This alternative algorithm is an example of a Gibbs sampling algorithm. Notice here that the simulation of each λ 8i〉 is specified from a previous value θ8iy]〉, which implies that the random variable pairs (P8i〉, H8i〉) are not independent across i. The sequence of bivariate random variables ((P8]〉, H8]〉), . . . , (P8k〉, H8k〉)) is a nontrivial Markov chain.
Example: Gibbs sampling in a multinomial model. Suppose k = (X] , . . . , X5 ) is a multinomial random vector summarizing n multinomial trials in five categories, whose associated probabilities are parameterized as
p] = a] θ ] + b] , p2 = a2 θ ] + b2 , p3 = a3 θ2 + b3 , p4 = a4 θ2 + b4
and p5 = 1 - (p] +p2 +p3 +p4 ), in which the ai and bi are known constants, each 0 < θi < 1, and θ ] + θ2 < 1. Suppose the prior PDF has the form
fP (k) x θ ](a)1 y] θ2(a)2 y] (1 - θ ] - θ2 )a3 y] .
The posterior distribution associated with this model is challenging to work with directly. However, it is possible to formulate a tractable Gibbs sampling algorithm by augment- ing the multinomial data in five categories k = (X] , . . . , X5 ) to multinomial data in nine categories Y = (Y] , . . . , Y3 ) with associated probabilities
q] = a] θ ] , q2 = b] , p3 = a2 θ ] , q4 = b2 ,
q5 = a3 θ2 , qq = b3 , q之 = a4 θ2 qs = b4
and q3 = 1 - (q] + q2 + q3 + q4 + q5 + qq + q之 + qs ). The relationship between these vectors is
X] = Y] + Y2 , X2 = Y3 + Y4 , X3 = Y5 + Yq , X4 = Y之 + Ys ,
and X5 = Y3 . If Y were observed, one could work with the posterior distribution with PDF
fP -Y (kIy) x θ ](z)1 +z3 +a1 y] θ2(z)5 +z7 +a2 y] (1 - θ ] - θ2 )z9 +a3 y] .
Moreover, a sample from this distribution could be simulated using a Gibbs sampling algorithm based on the conditional distributions
P] I{Y = y, P2 = θ2 } ~ (1 - θ2 ) × beta(y] + y3 + α] , y3 + α3 )
P2 I{Y = y, P] = θ ] } ~ (1 - θ ] ) × beta(y5 + y之 + α2 , y3 + α3 ).
Yet, despite that Y is unobserved, it is possible to simulate from the conditional distribu- tions that connect it to the observed data k according to
Y] I{X] = x] , 扌 = k} ~ binomial ╱x] , 、 ,
Y3 I{X2 = x2 , 扌 = k} ~ binomial ╱x2 , 、 ,
Y5 I{X3 = x3 , 扌 = k} ~ binomial ╱x3 , 、 ,
Y之 I{X4 = x4 , 扌 = k} ~ binomial ╱x4 , 、 .
The associated Gibbs sampling algorithm treats the augmented data as parameters to be updated. At iteration i, the current value of relevant set of “parameters” is (θ](8)i〉, θ y8i〉), which are updated by simulating
θ ](8)i+]〉 ~ {1 - θ2(8)i〉} × beta(y](8)i〉 + y3(8)i〉 + α] , y3(8)i〉 + α3 )
θ2(8)i+]〉 ~ {1 - θ ](8)i〉} × beta(y5(8)i〉 + y之(8)i〉 + α2 , y3(8)i〉 + α3 ),
followed by
y ](8)i+]〉 ~ binomial !x] , ! ,
y5(8)i+]〉 ~ binomial !x3 , ! ,
y3(8)i+]〉 ~ binomial !x2 , ! ,
y之(8)i+]〉 ~ binomial !x4 , ! ,
and then setting y2(8)i+]〉 = x] - y y4(8)i+]〉 = x2 - y yq(8)i+]〉 = x3 - y ys(8)i+]〉 = x4 - y之(8)i+]〉 , and y x5 .
Example: Posterior maximization in a multinomial model using the EM algorithm. Consider the previous example in which k = (X] , . . . , X5 ) is a multinomial random vector summarizing n multinomial trials in five categories, with associated probabilities
p] = a] θ ] + b] , p2 = a2 θ ] + b2 , p3 = a3 θ2 + b3 , p4 = a4 θ2 + b4
and p5 = 1- (p] +p2 +p3 +p4 ). An augmented model has Y = (Y] , . . . , Y3 ), with probabilities
q] = a] θ ] , q2 = b] , p3 = a2 θ ] , q4 = b2 ,
q5 = a3 θ2 , qq = b3 , q之 = a4 θ2 qs = b4 ,
and q3 = 1 - (q] + q2 + q3 + q4 + q5 + qq + q之 + qs ). The connection between them is
X] = Y] + Y2 , X2 = Y3 + Y4 , X3 = Y5 + Yq , X4 = Y之 + Ys ,
and X5 = Y3 . The prior PDF
fP (k) x θ ](a)1 y] θ2(a)2 y] (1 - θ ] - θ2 )a3 y] .
implies the posterior PDF
fP -Y (kIy) x θ ](z)1 +z3 +a1 y] θ2(z)5 +z7 +a2 y] (1 - θ ] - θ2 )z9 +a3 y] .
The posterior PDF defined conditionally on the augmentation, Y , makes feasible the use of the EM algorithm to find the posterior mode. In particular, it provides that within the expectation step, the quantity Q(k; k8i〉) is
Q(k; k8i〉) = Eθ(i) *log fP -Y (kIY)Ik = ml
= (Eθ(i)[Y] + Y3 Ik = m] + α] - 1) log θ]
+(Eθ(i) [Y5 + Y之 Ik = m] + α2 - 1) log θ2
+(x5 + α3 - 1) log(1 - θ ] - θ2 ),
whose conditional expectations are readily evaluated using the properties
Y] I{X] = x] , 扌 = k} ~ binomial ╱x] , 、 ,
Y3 I{X2 = x2 , 扌 = k} ~ binomial ╱x2 , 、 ,
Y5 I{X3 = x3 , 扌 = k} ~ binomial ╱x3 , 、 ,
Y之 I{X4 = x4 , 扌 = k} ~ binomial ╱x4 , 、 ,
as in the previous example; that is,
Eθ(i) [Y] + Y3 Ik = m] = x] + x2
a3 θ2(8)i〉 a4 θ2(8)i〉
Eθ(i) [Y5 + Y之 2(8)i〉 2(8)i〉 + b4 .
Leaving the conditional expectations unevaluated, the relevant first order partial deriva- tives are
Q(k; k8i〉) = -
∂ Q(k; k8i〉) = Eθ (i) [Y5 + Y之 Ik = m] + α2 - 1 - x5 + α3 - 1
It follows that the desired value of k8i〉 is the value of k at which
θ ] Eθ (i) [Y] + Y3 Ik = m] + α] - 1
1 - θ ] - θ2 x5 + α3 - 1
θ2 Eθ (i) [Y5 + Y之 Ik = m] + α2 - 1
1 - θ ] - θ2 x5 + α3 - 1 ,
which is solved by straightforward algebra. (For instance, if the right-hand sides of the formulas above are respectively denoted A] and A2 , and Bi = Ai /(1 + Ai ), the solution is θ ] = B] (1 - B2 )/(1 - B] B2 ) and θ2 = B2 (1 - B] )/(1 - B] B2 ).)
2022-04-18