G14CST/MATH4007 —Computational
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
G14CST/MATH4007 —Computational
Statistics
Chapter II - Simulation
2 Simulation
By simulation, we mean the use of a model to produce artificial ‘results’
rather than obtaining them from a physical experiment. When should we
use stochastic models as opposed to deterministic ones? A lack of knowledge
about a deterministic system is often accounted for by adding some random-
ness to the models. Whenever we have a model, we can simulate from the
model to learn about it. Note however, that we learn about the model, not
the physical system!
In order to simulate from a stochastic model, we need a source of random-
ness. Most simulation methods require a sequence of uniformly distributed
random variables as the starting point from which more complicated simula-
tions can be derived. How could we generate U [0, 1] random variables? The
approach is to generate a sequence of pseudo-random numbers, which are
generated deterministically, but appear indistinguishable from random when
statistical tests are applied.
Definition: A sequence of pseudo-random numbers {ui} is a determin-
istic sequence of numbers in [0, 1] having the same statistical properties as a
similar sequence of random numbers.
Note that the sequence {ui} is reproducible provided the first term in the
sequence is known (which is known as the seed). A good sequence would be
“unpredictable to the uninitiated”. That is, if we were only shown the se-
quence produced, and didn’t know the deterministic formula used to produce
it, we would not be able to predict its future behaviour, and the sequence
would appear to be completely random.
2.1 Congruential generators
The following sequence
0.155109197, 0.701655716, 0.813951522, 0.568807691, 0.087282449,
was generated by starting with the number N0 = 78953536 then calculating
N1 = (aN0) mod M , N2 = (aN1) mod M , N3 = (aN2) mod M , and then
setting U1 = N1/M , U2 = N2/M , U3 = N3/M , etc.
The two constants are
a = 216 + 3 = 65539, M = 231 = 2147483648.
N1 = remainder when 65539 × 78953536 is divided by 2147483648. Then
U1 = N1/2147483648.
The general form of a congruential generator is
Ni = (aNi−1 + c) mod M,
Ui = Ni/M, where integers a, c ∈ [0,M − 1]
If c = 0, it is called amultiplicative congruential generator (otherwise, mixed).
N0 is called the seed.
The numbers produced are restricted to the M possible values
0,
1
M
,
2
M
, . . . ,
M − 1
M
.
Clearly, they are rational numbers, but if M is large they will practically
cover the reals in [0, 1]. As soon as some Ni repeats, say, Ni = Ni+T , then
the whole subsequence repeats, i.e. Ni+t = Ni+T+t, t = 1, 2, . . .. The least
such T is called the period.
A good generator will have a long period. The period cannot be longer
than M and also depends on a and c.
2.2 Generation from non-U(0, 1)
Now suppose we have a sequence U1, U2, U3, . . . of independent uniform ran-
dom numbers in [0, 1], and we want to generate X1, X2, . . . distributed inde-
pendently and identically from some other specified distribution. That is, we
wish to transform the U1, U2, . . . sequence into a X1, X2, . . . sequence. There
are often many ways of doing this. A good, efficient algorithm should be fast
because millions of random numbers from the desired distribution may be
required.
2.2.1 The inversion method
Let X be any continuous random variable and define Y = FX(X), where FX
is the distribution function of X : FX(x) = P (X ≤ x).
Claim: Y ∼ U [0, 1].
Proof: Y ∈ [0, 1] and the distribution function of Y is
FY (y) = P (Y ≤ y) = P
(
FX(X) ≤ y
)
= P
(
X ≤ F−1X (y)
)
= FX
(
F−1X (y)
)
= y
which is the distribution function of a uniform random variable on [0, 1].
So, whatever the distribution of X , Y = FX(X) is uniformly distributed
on [0, 1].
The inversion method turns this backwards. Let U = FX(X), so that
X = F−1X (U). To generate from FX(·), take a single uniform variable U ,
calculate F−1X (U), and then X will have the required distribution:
P (X ≤ x) = P (F−1X (U) ≤ x)
= P
(
U ≤ FX(x)
)
= FX(x)
space
Example: exponential distribution
Let X have the exponential distribution with parameter λ (mean 1λ). I.e.
f(x) = λe−λx (x > 0)
F (x) =
∫ x
0
λe−λz dz = [−e−λz]x0 = 1− e−λx.
Set U = 1− e−λX and solve for X
X = −1
λ
ln(1− U).
Note that 1 − U is uniformly distributed on [0, 1], so we might just as well
use
X = −1
λ
lnU.
2.2.2 Discrete distributions
The inversion method works for discrete random variables in the following
sense. Let X be discretely distributed with k distinct possible values xi
having probabilities pi, i = 1, . . . , k. So
P (X = xi) = pi,
k∑
i=1
pi = 1.
Then FX(x) =
∑
xi≤x
pi is a step function.
Inversion gives X = xi if
∑
xj
∑
xj≤xi
pj, which selects each
possible value of X with the correct probability.
Think of this as splitting [0, 1] into intervals of length pi. The interval in
which U falls is the value of X , which occurs with probability pi as required.
Example
Let X ∼ Bi(4, 0.3). The probabilities are
P (X = 0) = 0.2401, P (X = 1) = 0.4116, P (X = 2) = 0.2646
P (X = 3) = 0.0756, P (X = 4) = 0.0081.
The algorithm says X = 0 if 0 ≤ U ≤ 0.2401,
X = 1 if 0.2401 < U ≤ 0.6517,
X = 2 if 0.6517 < U ≤ 0.9163,
X = 3 if 0.9163 < U ≤ 0.9919,
X = 4 if 0.9919 < U ≤ 1.
One way to perform the algorithm is this: let U ∼ U(0, 1).
1. Test U ≤ 0.2401. If true, return X = 0.
2. If false, test U ≤ 0.6517. If true, return X = 1.
3. If false, test U ≤ 0.9163. If true, return X = 2.
4. If false, test U ≤ 0.9919. If true, return X = 3.
5. If false, return X = 4.
Consider the speed of this. The expected number of steps (which roughly
equates to speed) is
1× 0.2401 + 2× 0.4116 + 3× 0.2646 + 4× 0.0756 + 4× 0.0081 = 2.1919.
To speed things up we can rearrange the order so that the later steps are
less likely.
1. Test U ≤ 0.4116. If true return X = 1.
2. If false, test U ≤ 0.6762. If true return X = 2.
3. If false, test U ≤ 0.9163. If true return X = 0.
4. and 5. as before.
The expected number of steps is 1× 0.4116 + 2× 0.2646 + 3× 0.2401 + 4×
(0.0756 + 0.0081) = 1.9959: Roughly a 10% speed increase.
2.2.3 Transformations
Here are a few useful relationships which can be used to transform simulated
random variables between related distributions:
(a) If U ∼ U(0, 1) set V = (b− a)U + a then V ∼ U(a, b) where a < b.
(b) If Yi are iid exponential with parameter λ then
X =
n∑
i=1
Yi = −1
λ
n∑
i=1
logUi = −1
λ
log
(
n∏
i=1
Ui
)
has a Ga(n,λ) distribution.
(c) If X1 ∼ Ga(p, 1), X2 ∼ Ga(q, 1), X1 and X2 independent then Y =
X1/(X1 +X2) ∼ Be(p, q).
(d) Composition: if
f(x) =
r∑
i=1
pifi(x)
where
∑
pi = 1 and each fi is a density, then we can sample from f
by first sampling an index j from the set {1, . . . , r} with correspond-
ing probability distribution p = {p1, . . . , pr}, and then taking a sample
from fj .
f(x) is a mixture density (we are “mixing” (averaging) the component
densities with weights given by the probability distribution of the in-
dex).
2.2.4 The Box-Mu¨ller algorithm for the normal distribution
We cannot generate a normal random variable by inversion, because FX is
not known in closed form (nor is its inverse). The Box–Mu¨ller method (1958)
can be used instead. Take U1, U2 ∼ U(0, 1) independently. Calculate
X1 =
√
−2 lnU1 cos(2πU2),
X2 =
√
−2 lnU1 sin(2πU2).
Then X1 and X2 are independent N(0, 1) variables. You will be invited to
prove this on the exercise sheet.
2.3 Rejection Algorithm
Fundamental Theorem of Simulation:
Simulating
X ∼ f(x)
is equivalent to simulating
(X,U) ∼ U{(x, u) : 0 < u < f(x)}.
Note that fX,U(x, u) = {0∫
f(x, u)du =
∫ f(x)
0
du = f(x),
so that f is the marginal density of the joint distribution (X,U) ∼ U{(x, u) :
0 < u < f(x)}.
The problem with this result is that simulating uniformly from the set
{(x, u) : 0 < u < f(x)}
may not be possible. A solution is to simulate the pair (X,U) in a bigger
set, where simulation is easier, and then take the pair if the constraint is
satisfied.
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
x
D
en
si
ty
Suppose that f(x) is zero outside the interval [a, b] (so that
∫ b
a f(x)dx = 1)
and that f is bounded above by m. We can then simulate the pair
(Y, U) ∼ U{[a, b]× [0, m]}
( by simulating Y ∼ U [a, b], U ∼ U [0, m] independently). We then
accept the pair (Y, U) if 0 < U < f(Y ), in which case we return X = Y as
our simulated value from the distribution of X .
This results in the correct distribution for the simulated value X , since
P(X ≤ x) = P(Y ≤ x|U < f(Y ))
=
P(Y ≤ x, U < f(Y ))
P(U < f(Y ))
=
∫ x
a
∫ f(y)
0 f(y, u)dudy∫ b
a
∫ f(y)
0 f(y, u)dudy
=
∫ x
a
f(y)dy.
Note: we can use the rejection algorithm even if we only know f up
to a normalising constant (as is often the case in Bayesian statistics, for
example). To see this, suppose f(x) = f1(x)c , where c is the normalizing
constant of f (which may be unknown, or very difficult to compute). Then,
if we use f1(x) instead of f(x) in the algorithm, the value of m will change
(it is the bound on f1(x) not f(x)), but the accepted values will still be from
the correct distribution — this is because the acceptance probability of a
candidate value Y is the same in both cases.
Example: Sampling from a beta distribution
Consider sampling from X ∼ Beta(α, β) for α, β > 1 which has pdf
f(x) =
Γ(α + β)
Γ(α)Γ(β)
xα−1(1− x)β−1 0 < x < 1.
We note
f(x) ∝ f1(x) = xα−1(1− x)β−1 0 < x < 1
and that M = sup
0
α + β − 2 (mode) and
hence
M =
(α− 1)α−1(β − 1)β−1
(α + β − 2)α+β−2 .
The rejection algorithm is
1. Generate Y ∼ U(0, 1) and U ∼ U(0,M).
2. If U ≤ f1(Y ) = Y α−1(1 − Y )β−1 then let X = Y (accept) else go to 1
(reject).
2.3.1 Generalising the Rejection Idea
If the support of f is not finite, then bounding it within a rectangle will not
work. Instead of using a box to bound the density f(x) (ie requiring f(x) < m
for some constant m) we can use a function m(x) such that f(x) ≤ m(x) for
all x.
Suppose the larger bounding set is
L = {(y, u) : 0 < u < m(y)}.
Then all we require is that simulation of a uniform from L is feasible. Note
1. The closer m is to f the more efficient our algorithm.
2. Because m(x) ≥ f(x), m is usually not a probability density. We write
m(x) = Mg(x), where
∫
m(x)dx =
∫
Mg(x)dx = M
for some density g.
This suggests a more general implementation of the fundamental theorem:
Let X ∼ f(x) and let g(x) be a density function that
satisfies f(x) ≤Mg(x) for some constant M ≥ 1. Then,
to simulate X ∼ f , it is sufficient to generate
Y ∼ g and U |Y = y ∼ U(0,Mg(y))
and set X = Y if U < f(Y ).
Again, this results in the correct distribution for the simulated value X .
(Proof on next page.)
However, when implementing the algorithm in practice, it is more conve-
nient to rescale the uniform distribution such that the support of U doesn’t
depend on the particular value y sampled. Hence, the rejection algorithm is
usually stated in a slightly modified form:
Rejection Algorithm
If g is such that f/g is bounded, and therefore we
can find a value M such that Mg(x) ≥ f(x) for
all x, then we can simulate from f as follows:
1. Generate Y from density g, and U from U(0, 1).
2. If U < f(Y )/Mg(Y ) set X = Y . Otherwise, re-
turn to step 1.
We keep sampling new pairs Y and U until the condition is satisfied.
Proof: validity of the general rejection algorithm.
Let X be a random variable denoting an accepted sample - i.e. a sample Y
from g which met the acceptance criterion. We need to show that the cdf of
X is the one we want.
P(X ≤ x) = P
(
Y ≤ x|U < f(Y )
Mg(Y )
)
=
P
(
Y ≤ x, U < f(Y )Mg(Y )
)
P
(
U < f(Y )Mg(Y )
)
=
∫ x
−∞
∫ f(y)
Mg(y)
0 g(y)f(u) dudy∫∞
−∞
∫ f(y)
Mg(y)
0 g(y)f(u) dudy
=
∫ x
−∞
f(y)
Mg(y)g(y) dy∫∞
−∞
f(y)
Mg(y)g(y) dy
=
1
M
∫ x
−∞ f(y) dy
1
M
∫∞
−∞ f(y) dy
=
∫ x
−∞
f(y) dy,
which is the definition of the cdf of the random variable with density f that
we initially wanted to simulate from.
Example: Sampling from a beta distribution revisited
Use rejection to sample from X ∼ Beta(α, β). Let g(y) = αyα−1, 0 < y < 1,
then
f1(x)
g(x)
=
(1− x)β−1
α
is bounded if and only if β ≥ 1
Then M = sup
x
{
f1(x)
g(x)
}
=
1
α
occurs at x = 0.
1. Simulate Y with pdf g(y) = αyα−1, 0 < y < 1 and U ∼ U(0, 1).
2. If U <
f1(Y )
Mg(Y )
=
(1− Y )β−1(
1
α
)
α
= (1 − Y )β−1 then set X = Y else go to
1.
The full algorithm is:
1. Generate U ∼ U(0, 1) and Z ∼ U(0, 1). Let Y = Z 1α .
2. If U ≤ (1− Y )β−1 then set X = Y else go to 1.
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
x
D
en
si
ty
2026-01-24