Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSCI 567, Summer 2022

Theory Assignment 4

Instructions

Submission:  Assignment submission will be via courses.uscden.net. By the submission date, there will be a folder named Theory Assignment 1’set up in which you can submit your files. Please be sure to follow all directions outlined here.

You can submit multiple times, but only the last submission counts. That means if you finish some prob- lems and want to submit something first and update later when you finish, that’s fine.  In fact you are encouraged to do this: that way, if you forget to finish the homework on time or something happens (re- member Murphy’s Law), you still get credit for whatever you have turned in.

Problem sets must be typewritten or neatly handwritten when submitted. In both cases, your submis- sion must be a single PDF. It is strongly recommended that you typeset with LATEX. There are many free integrated LATEX editors that are convenient to use (e.g Overleaf, ShareLaTeX). Choose the one(s) you like the most. This tutorial Getting to Grips with LaTeXis a good start if you do not know how to use LATEX yet.

Please also follow the rules below:

• The file should be named as firstname lastname USCID .pdf e.g., Don Quijote de la Mancha 8675309045 .pdf).

• Do not have any spaces in your file name when uploading it.

• Please include your name and USCID in the header of your report as well.

Collaboration:  You may discuss with your classmates. However, you need to write your own solutions and submit separately.  Also in your report, you need to list with whom you have discussed for each problem. Please consult the syllabus for what is and is not acceptable collaboration. Review the rules on academic conduct in the syllabus:  a single instance of plagiarism can adversely affect you significantly more than you could stand to gain.

Notes on notation:

• Unless stated otherwise, scalars are denoted by small letter in normal font, vectors are denoted by small letters in bold font and matrices are denoted by capital letters in bold font.

•  ∥ . ∥ means L2-norm unless specified otherwise i.e. ∥ . ∥ = ∥ . ∥2

Problem 1  Kmeans Clustering

In Lecture 15 (slide 5), we describes the loss function of k-means as follows -

F({γnk}, {µl}) =  γnk||xn µk||2(2)

Refer to slide 10 of lecture 15 for the update rule of k-means.

1.1  Show that the loss function decreases in each iteration of K-means.

1.2  Argue that k-means converges in a finite number of iterations.

1.3  We update our loss function to use l1 norm instead of l2 norm such that

(32 points)

(8 points) (8 points)

F({γnk}, {µk}) =  γnk|xn µk|

How does the performance of this compare to the original loss function for the case of data consisting of outliers?                                                                                                                                     (4 points)

1.4  Show that cluster centers are medians of the points in the cluster.                                         (12 points)

Problem 2  Gaussian Mixture Model                                                                 (32 points)

Let X1, ..., Xn  ∈ Rd be independent, identically distributed points sampled from a mixture of two normal (Gaussian) distributions. They are drawn independently from the probability distribution function (PDF)

p(x) = θN1(x) + (1 − θ)N2(x), where N1(x) = e−∥xµ1 2 /2, and N2(x) = e−∥xµ2 2 /2

are the PDFs for the isotropic multivariate normal distributions N(µ1, 1) and N(µ2, 1), respectively. The parameter θ ∈ (0, 1) is called the mixture proportion. In essence, we flip a biased coin to decide whether to

draw a point from the first Gaussian (with probability θ) or the second (with probability 1 θ).

input, but does not know Zi .

Our goal is to find the maximum likelihood estimates of the three unknown distribution parameters θ  ∈ (0, 1), µ 1   ∈ Rd, and µ2   ∈ Rd  from the sample points X1, ..., Xn  .  Unlike MLE for one Gaussian, it is not possible to give explicit analytic formulas for these estimates.  Instead, we develop a variant of k- means clustering which (often) converges to the correct maximum likelihood estimates of θ, µ 1, andµ2 . This variant doesn’t assign each point entirely to one cluster; rather, each point is assigned an estimated posterior probability of coming from normal distribution 1.

2.1  Let τi  = P(Zi  = 1|Xi).  That is, τi  is the posterior probability that point Xi has Zi  = 1.  Use Bayes’ Theorem to express τi in terms of Xi, θ , µ 1, µ2, and the Gaussian PDFs N1 (x) and N2 (x). To help you with

question 2.3, also write down a similar formula for 1 τi, which is the posterior probability that Zi = 2.

2.2  Write down the log-likelihood function, l(θ, µ 1, µ2; X1, ..., Xn) = ln p(X1, ..., Xn), as a summation. Note: it doesn’t simplify much.

(5 points)

2.3  Express  in terms of θ and τi, i  ∈ {1, ..., n} and simplify as much as possible.  There should be no normal PDFs explicitly in your solution, though the τi’s may implicitly use them.  Hint:  Recall that (ln f (x)) =  ).

(5 points)

2.4  Express ∇µ1 l in terms of µ 1 and τi, Xi, i ∈ {1, ..., n}. Do the same for ∇µ2 l (but in terms of µ2 rather than µ1). Again, there should be no normal PDFs explicitly in your solution, though the τi’s may implicitly use them. Hint: It will help (and get you part marks) to first write ∇µ1 N1(x) as a function of N1(x), x, and µ 1 .

(7 points)

2.5  We conclude: if we know µ 1, µ2, and θ, we can compute the posteriors τi .  On the other hand, if we know the τi’s, we can estimate µ 1, µ2, and θ by using the derivatives in 2.3 and 2.4 to find the maximum likelihood estimates. This leads to the following k-means-like algorithm.

• Initialize τ1, τ2, ..., τn to arbitrary values in the range [0, 1].

• Repeat the following two steps.

1. Update the Gaussian cluster parameters: for fixed values of τ1, τ2, ..., τn, update µ1, µ2, and θ .

2. Update the posterior probabilities: for fixed values of µ1, µ2, and θ, update τ1, τ2, ..., τn .

In part 2.1, you wrote the update rule for step 2. Using your results from parts 2.3and 2.4, write down the explicit update formulas for step 1.

(9 points)