COMP3670: Introduction to Machine Learning
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Semester 2, 2022
Assignment 3 Theory Questions
COMP3670: Introduction to Machine Learning
Note: For the purposes of this assignment, we let lowercase p denote probability density functions (pdf’s), and upper case P denote probabilities. If a random variable Z is characterized by a probability
density function p, we have that
P(a ≤ Z ≤ b) = \ab p(z) dz
You should show your derivations, but you may use a computer algebra system (CAS) to assist with integration or differentiation.1.
Question 1 Bayesian Inference (40 credits)
Let X be a random variable representing the outcome of a biased coin with possible outcomes X = {0, 1}, x ∈ X . The bias of the coin is itself controlled by a random variable Θ, with outcomes2 θ ∈ θ , where
θ = {θ ∈ R : 0 ≤ x ≤ 1}
The two random variables are related by the following conditional probability distribution function of X given Θ .
p(X = 1 | Θ = θ) = θ
p(X = 0 | Θ = θ) = 1 − θ
We can use p(X = 1 | θ) as a shorthand for p(X = 1 | Θ = θ).
We wish to learn what θ is, based on experiments by flipping the coin. Before we flip the coin, we
p(θ) = 30θ2 (1 − θ)2
which, when plotted, looks like this:
θ
a) (3 credits) Verify that p(θ) = 30θ2 (1 − θ)2 is a valid probability distribution on [0, 1] (i.e that it is always non-negative and that it is normalised.)
We flip the coin a number of times.3 After each coin flip, we update the probability distribution for θ to reflect our new belief of the distribution on θ, based on evidence.
Suppose we flip the coin four times, and obtain the sequence of coin flips 4 x1:4 = 0101. For its two subsequences 01 and 0101, denoted by x1:2 ,x1:4 (and for the case before any coins are flipped), complete the following questions.
b) (15 credits) Compute their probability distribution functions after observing the two subsequences x1:2 and x1:4 , respectively.
c) (3 credits) Compute their expectation values µ of θ before any evidence as well as after observing the two subsequences x1:2 and x1:4 , respectively.
d) (3 credits) Compute their variances σ 2 of θ before any evidence as well as after observing the two subsequences x1:2 and x1:4 , respectively.
e) (5 credits) Compute their maximum a posteriori estimations θMAP of θ before any evidence as well as after observing the two subsequences x1:2 and x1:4 , respectively.
Present your results in a table like as shown below.
Posterior |
|
µ |
2 |
MAP |
p(θ) p(θ|x1:2 = 01) p(θ|x1:4 = 0101) |
30θ2 (1 − θ)2 ? ? |
? ? ? |
? ? ? |
? ? ? |
f) (5 credits) Plot each of the probability distributions p(θ),p(θ|x1:2 = 01),p(θ|x1:4 = 0101) over the interval 0 ≤ θ ≤ 1 on the same graph to compare them.
g) (6 credits) What behaviour would you expect of the posterior distribution p(θ|x1:n) if we updated on a very long sequence of alternating coin flips x1:n = 01010101 ...?
What would you expect µ,σ 2 ,θMAP to look like for large n?
Sketch/draw an estimate of what p(θ|x1:n) would approximately look like against the other distri- butions.
Question 2 Bayesian Inference on Imperfect Information (50 credits)
We have a Bayesian agent running on a computer, trying to learn information about what the pa- rameter θ could be in the coin flip problem, based on observations through a noisy camera. The noisy camera takes a photo of each coin flip and reports back if the result was a 0 or a 1. Unfortunately, the side of the coin with a ”1”on it is very shiny, and the reflected light causes the camera to sometimes report back the wrong result.5 The probability that the camera returns a correct answer is parame-
terised by ϕ ∈ [0, 1]. Letting X denote the true outcome of the coin, and denoting what the camera
ϕ
X = 0 = 0
1 − θ
X = 1 = 1
So, we have
p( = 0 | ϕ,X = 0) = ϕ
parameter ϕ . Let 1:n be a sequence of coin flips as observed by the camera.
a) (5 credits) Briefly comment about how the camera behaves for ϕ = 1,ϕ = 0.5,ϕ = 0. How you
expect this would change how the agent updates it’s prior to a posterior on θ, given an observation
of . (No equations required.)
Simplify your expression.
to compare them. Comment on how the shape of the distribution changes with ϕ . Explain your observations.
Question 3 Relating Random Variables (10 credits)
Let X be a random variable, on [0, 1], with probability density function
p(x) = x2 + x +
Let Y be a random variable on [2, 3], such that Y = X2 + 2. Find the probability density function for
.
2022-10-06