闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS 179: Introduction to Graphical Models: Spring 2022

Homework 6

The submission for this homework should be a single PDF ﬁle containing all of the relevant code, ﬁgures, and any text explaining your results. When coding your answers, try to write functions to encapsulate and reuse code, instead of copying and pasting the same code multiple times. This will not only reduce your programming efforts, but also make it easier for us to understand and give credit for your work. Show and explain the reasoning behind your work!

In this homework, we will Pyro’s stochastic variational inference procedure to explore a simple model for collaborative ﬁltering, speciﬁcally, to predict movie ratings on a (very small) subset of the MovieLens dataset.

For more examples of variational inference in Pyro, please see the course demos on Pyro (VI) and Bayesian linear regression.

Part 1: Loading the data (20 points)

First, load the training data from . The ﬁrst line of the ﬁle is text, listing the names of

M = 10 movies; the remaining lines are comma-separated integers between zero and nine, or (not-a-number),

indicating the value was not observed. The training data ﬁle contains a subset of the ratings of N = 200 users

(one line per user). Store the training data in a 200x 10 numpy array called . You may ﬁnd ’s

function helpful.

Part 2: Model Set-up (30 points)

We will use a Bayesian variant of the SVD low-rank decomposition, or latent space reprsentation, for collaborative ﬁltering described in CS178. If you haven’t seen this and want to read more about it, you can see these data evaluated using non-probabilistic unsupervised learning in the CS178 course notes:

https://canvas.eee.uci.edu/courses/31018/files/folder/Notes

parts 10 (clustering) and 11 (latent space models).

Our model associates a K-dimensional “position” with each user n and each movie m, denoted Un and Vm , respectively. Then, the degree to which user n likes movie m is a function of their dot product, Un．Vm(T) = k Unk Vmk . This will try to place similarly rated movies in a similar direction from the origin, with users that like those movies placed in the same direction, so that the dot product will be large.

A small modiﬁcation helps the model in practice. Since some users are more positive than others, and some movies are more widely acknowledged as good or bad, we may want to instead predict “relative” preference. For this homework, we will do this very simply, by estimating and subtracting the mean over the movies:

1 movie_avg = np .nanmean(Xtr , axis=0, keepdims=True) # ignore NaN when averaging 2 Xtr -= movie_avg

and then do the same thing for the average over the users (estimate the mean over axis 1, and subtract from X.) (For more sophistication, we could add these as variables in the model and reason over them as well.)

We place simple Gaussian .(0, 1) priors on all the unknown values: the entries in both matrices U, V . Then, we express the probability of our observations as Gaussian with ﬁxed standard deviation:

rnm ～.╱R ; Un．Vm(T) , 0.1 、.

To do so, deﬁne your function to generate a collection of M two-dimensional Gaussian random

variables (one for each movie). Your model will be most efﬁcient if you use Pyro’s for indexing, which informs Pyro that the indexed elements are conditionally independent from one another; see the generation of the theta parameters in

http://sli.ics.uci.edu/extras/cs179/Demo%20-%20Bayesian%20Linear%20Regression%20(Pyro).html

for an example. Similarly, deﬁne a plate and a list of 2D Gaussian random variables, Un for each user n. Finally, the evidence of our model (training data) is what makes our model interesting, but un-normalized

and thus difﬁcult to reason about _ sample from. Your function should accept a data set (along

with any other parameters you need); add all the passed (non-missing _ NaN) observations to the model, as 1D

V[m]@U[n]

may be helpful.)

Part 3: Variational Posterior (30 points)

Next, we deﬁne the function. The guide function call should take exactly the same parameters as the function, and deﬁne exactly the same random variables, except for those whose values are observed.

We will deﬁne our guide to be a product of independent, 2D Gaussians, i.e., each Vm and Un are independent in q(．). (Note: they are of course not independent in the true posterior!) More explicitly, deﬁne Pyro

sample

Deﬁne an optimizer and a inference engine as in the Bayesian regression example. Then, optimize over

the parameters of your q(．) by iteratively calling . This process can be quite slow; I recommend that you:

(1) Display the current means and variances of your movies, i.e., q(Vm ), after every few iterations. You can use the provided Gaussian plot function at the end of the homework, passing the means and covariances along with colors if desired. I also prefer to clear the plot outputs each time; again see the Bayesian L.R. demo for an example.

(2) Call using a sub-sample of the full data. This can be a collection of ≈ 20 一 50 randomly selected

ratings, or a subsample of ≈ 5 一 10 users’ ratings, depending on how you’ve implemented things. This should speed up each iteration and help your model converge more quickly.

Part 4: Visualizing Uncertainty (20 points)

One of the advantages of a probabilistic model is our ability to gauge our uncertainty in our estimates.

o Re-plot your movies’ estimated posterior means and variances (the same Gaussian ellipses suggested during optimization). You should ﬁnd that similar movies are located “nearby”, meaning, in the same direction from the origin (0, 0) as one another (since we measure ratings through dot products).

o For users m ∈ {40, 80, 120}, plot the users’ posterior mean and uncertainty on the same plot as the movies. How does your estimated uncertainty about the users’ representation (the covariance) compare to that for the movies? Why do you think this is?

Side note: Uncertainty is useful in many settings – we can use uncertainty in our predictions to help us acquire helpful training data (active learning), or to decide on actions that balance obtaining an immediate reward with exploring other alternatives (e.g., Thompson sampling in multi-armed bandit problems).

Useful helper function: