Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSE 250A FINAL EXAM

FALL QUARTER 2020

1. d-separation (10 pts)

Consider the following statements of marginal or conditional independence in the belief network shown below. For each statement, indicate whether it is true or false, and then justify your answer as shown in the provided examples. Note that credit will only be awarded to answers that are supported by further explanation.

Examples:

(i) P(F|A, C) ?= P(F|C)

True. There are four paths from node A to node F. They are

A → C → F,

A → E ← C → F,

A → C → G ← D → F,

A → E ← C → G ← D → F,

and each of these paths is blocked by node C. (Some of these paths are also blocked by other nodes, but the answer is already complete as written.)

(ii) P(C, D|G) ?= P(C|G) P(D|G)

False. The path C → G ← D is not blocked.

Problems:

(a) P(B|F) ?= P(B|E, F)

(b) P(B, E) ?= P(B) P(E)

(c) P(A|B, F) ?= P(A|F)

(d) P(A, B|E) ?= P(A|E) P(B|E)

(e) P(C|A, E, F, G) ?= P(C|A, B, E, F, G)

2. Polytree inference (10 pts)

For the belief network shown above, consider how to efficiently compute the conditional probability P(G|A, C, F, H). This can be done in four consecutive steps in which some later steps rely on the results from earlier ones.

Complete the procedure below for this inference by showing how to compute the following prob-abilities. Make each computation as efficient as possible, and briefly justify each step in your solutions for full credit. Your answers should be expressed in terms of the CPTs of the belief network and (as needed) the results of previous steps.

(a) Compute P(E|H).

(2 pts)

(b) Compute P(D|H).

(2 pts)

(c) Compute P(B|A, F).

(3 pts)

(d) Compute P(G|A, C, F, H).

(3 pts)

3. Na¨ıve Bayes versus logistic regression (10 pts)

(a) Na¨ıve Bayes model (2 pts)

Consider the belief network of discrete random variables shown above. Show how to com-pute the conditional probability

P(y|x1, x2, . . . , xd)

in terms of the belief network’s probability tables for P(y) and P(xi |y). Justify your steps to receive full credit.

(b) Log-odds (2 pts)

Consider the special case where all the variables in this belief network are binary-valued.

For this special case, compute the log-odds

in terms of the belief network’s probability tables. Simplify your answer as much as possible; this will be helpful for the next parts of the problem.

(c*) Linear decision boundary (3 pts)

Show that the log-odds from part (b) is a linear function of the values of x1, x2, . . . , xn. In particular, show that it can be written in the form:

for appropriately chosen values of a0, a1, . . . , ad. Your solution should express these values in terms of the belief network’s probability tables.

Hint: since each xi is equal to either 0 or 1, it may be a useful notation to write

(d) Logistic regression (2 pts)

Consider whether it is possible to express your result for P(Y =1|x1, x2, . . . , xd) in the form of a logistic regression. In particular, are there parameters w0, w1, . . . , wd such that

where σ(z) = (1 + e −z ) −1 is the sigmoid function? If yes, show how to choose these parameters so that this is true. If not, explain why not.

Hint: What is the inverse of the sigmoid function?

4. Nonnegative random variables (5 pts)

Let µ > 0. In this problem you will derive some elementary but useful properties of the exponential distribution

for a continuous, nonegative random variable with mean µ. (You will be exploiting these proper-ties to solve the last problem on the exam.)

(a) Log-likelihood (1 pt)

Let {z1, z2, . . . , zT } be an i.i.d. data set of nonnegative values. Assuming each value was drawn from an exponential distribution, compute the log-likelihood

in terms of the distribution’s parameter µ.

(b) Maximum likelihood estimation (1 pt)

Show that the maximum likelihood estimate for µ is given by the sample mean of the data.

(c) Cumulative distribution (1 pt)

Calculate the cumulative distribution, given by

when Z is exponentially distributed with mean µ.

(d) Comparison (2 pts)

Suppose that Z1 and Z2 are independent, exponentially distributed random variables with means µ1 and µ2, respectively. Show that

where the left side denotes the probability that Z1 exceeds Z2 in value.

Hint: note that P(Z1 > Z2) = R0∞ da P(Z1 = a) P(Z2 < a), and use your result from the previous part of this problem.

5. EM algorithm (10 pts)

Consider the belief network shown at the right, with observed nodes {A, B, E, F} and hidden nodes {C, D}. On the problems below, simplify your answers as much as possible, and briefly justify your steps to receive full credit.

(a) Hidden node C (2 pts)

Show how to compute the posterior probability P(C|A, B, E, F) in terms of the conditional probability tables (CPTs) of the belief network.

(b) Hidden node D (2 pts)

Show how to compute the posterior probability P(D|A, B, E, F) in terms of the CPTs of the belief network.

(c) Both hidden nodes (1 pt)

Show how to compute the posterior probability P(C, D|A, B, E, F) in terms of your previ ous results.

(d) Log-likelihood (2 pts)

Consider a data set of T partially labeled examples {at , bt , et , ft} Tt=1 over the observed nodes of the network. The log-likelihood of the data set is given by:

Compute this expression in terms of the CPTs of the belief network.

(e) EM algorithm (3 pts)

Consider the EM updates for the CPTs of this belief network that maximize the log-likelihood in part (d). Provide these updates for the following:

(i) P(C =c)

(ii) P(D =d|A=a, B =b)

(iii) P(E =e|B =b, C =c)

For this part of the problem, it is not necessary to justify your steps; it is only necessary to state the correct updates.

6. Inference in HMMs (8 pts)

Consider a discrete HMM with the belief network shown below. As usual, let st ∈ {1, 2, . . . , n} and ot ∈ {1, 2, . . . , m} denote, respectively, the hidden state and observation at time t; also, let

πi = P(S1 =i),

aij = P(St+1 =j|St =i),

bik = P(Ot =k|St =i),

denote the initial distribution over hidden states, the transition matrix, and the emission matrix. In your answers you may also use bi(k) to denote the matrix element bik.

(a) Inference (4 pts)

Let at denotes the tth power (via matrix multiplication) of the transition matrix, and assume that a0 denotes the identity matrix. Prove by induction or otherwise that

(b) More inference (4 pts)

The forward-backward algorithm in discrete HMMs computes the probabilities

αit = P(o1, o2, . . . , ot , St =i),

βit = P(ot+1, ot+2, . . . , oT |St =i).

In terms of these probabilities (which you may assume to be given) and the parameters (aij , bik, πi) of the HMM, show how to compute the conditional probability

P(o1, o2, . . . , oT |St =i, St+1 =j)

for times 1≤t < T. (Note that this is not the same conditional probability computed in lec-ture.) Show your work for full credit, justifying each step in your derivation, and simplifying your answer as much as possible. Hint: your result from part (a) may be useful.