COMP3670: Introduction to Machine Learning


Semester 2, 2022

Assignment 3 Theory Questions

Note: For the purposes of this assignment, we let lowercase p denote probability density functions (pdf’s), and upper case P denote probabilities. If a random variable Z is characterized by a probability

density function p, we have that

P(a b) = \ab p(z) dz

You should show your derivations, but you  may  use  a  computer  algebra  system  (CAS)  to assist with integration or differentiation.1.

Question 1                                            Bayesian Inference                                              (40 credits)

Let X be a random variable representing the outcome of a biased coin with possible outcomes X = {0, 1}, x ∈ X . The bias of the coin is itself controlled by a random variable Θ, with outcomes2  θ ∈ θ , where

θ = {θ ∈ R : 0 ≤ x ≤ 1}

The two random variables are related by the following conditional probability distribution function of X given Θ .

p(X = 1 | Θ = θ) = θ

p(X = 0 | Θ = θ) = 1 − θ

We can use p(X = 1 | θ) as a shorthand for p(X = 1 | Θ = θ).

We wish to learn what θ is, based on experiments by flipping the coin. Before we flip the coin, we

p(θ) = 30θ2 (1 − θ)2

which, when plotted, looks like this:


a)  (3 credits) Verify that p(θ) = 30θ2 (1 − θ)2  is a valid probability distribution on [0, 1] (i.e that it is always non-negative and that it is normalised.)

We flip the coin a number of times.3  After each coin flip, we update the probability distribution for θ to reflect our new belief of the distribution on θ, based on evidence.

Suppose we flip the coin four times, and obtain the sequence of coin flips 4  x1:4   = 0101. For its two subsequences 01 and 0101, denoted by x1:2 ,x1:4  (and for the case before any coins are flipped), complete the following questions.

b)  (15 credits) Compute their probability distribution functions after observing the two subsequences x1:2  and x1:4 , respectively.

c)  (3 credits) Compute their expectation values µ of θ before any evidence as well as after observing the two subsequences x1:2  and x1:4 , respectively.

d)  (3 credits) Compute their variances σ 2  of θ before any evidence as well as after observing the two subsequences x1:2  and x1:4 , respectively.

e)  (5 credits) Compute their maximum a posteriori estimations θMAP of θ before any evidence as well as after observing the two subsequences x1:2  and x1:4 , respectively.

Present your results in a table like as shown below.







p(θ|x1:2 = 01)   p(θ|x1:4 = 0101)

30θ2 (1 − θ)2












f)  (5 credits) Plot each of the probability distributions p(θ),p(θ|x1:2  = 01),p(θ|x1:4  = 0101) over the interval 0 ≤ θ ≤ 1 on the same graph to compare them.

g)  (6 credits) What behaviour would you expect of the posterior distribution p(θ|x1:n) if we updated on a very long sequence of alternating coin flips x1:n = 01010101 ...?

What would you expect µ,σ 2 ,θMAP  to look like for large n?

Sketch/draw an estimate of what p(θ|x1:n) would approximately look like against the other distri- butions.

Question 2                    Bayesian Inference on Imperfect Information                     (50 credits)

We have a Bayesian agent running on a computer, trying to learn information about what the pa- rameter θ could be in the coin flip problem, based on observations through a noisy camera. The noisy camera takes a photo of each coin flip and reports back if the result was a 0 or a 1. Unfortunately, the side of the coin with a ”1”on it is very shiny, and the reflected light causes the camera to sometimes report back the wrong result.5  The probability that the camera returns a correct answer is parame-

terised by ϕ ∈ [0, 1]. Letting X denote the true outcome of the coin, and  denoting what the camera


X = 0  = 0


X = 1   = 1

So, we have

p( = 0 | ϕ,X = 0) = ϕ

parameter ϕ . Let 1:n  be a sequence of coin flips as observed by the camera.

a)  (5 credits) Briefly comment about how the camera behaves for ϕ = 1,ϕ = 0.5,ϕ = 0. How you

expect this would change how the agent updates it’s prior to a posterior on θ, given an observation

of  . (No equations required.)

Simplify your expression.

to compare them. Comment on how the shape of the distribution changes with ϕ . Explain your observations.

Question 3                                    Relating Random Variables                                      (10 credits)

Let X be a random variable, on [0, 1], with probability density function

p(x) = x2 + x + 

Let Y be a random variable on [2, 3], such that Y = X2 + 2. Find the probability density function for
