CS 179: Introduction to Graphical Models: Spring 2022 Homework 6
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS 179: Introduction to Graphical Models: Spring 2022
Homework 6
The submission for this homework should be a single PDF file containing all of the relevant code, figures, and any text explaining your results. When coding your answers, try to write functions to encapsulate and reuse code, instead of copying and pasting the same code multiple times. This will not only reduce your programming efforts, but also make it easier for us to understand and give credit for your work. Show and explain the reasoning behind your work!
In this homework, we will Pyro’s stochastic variational inference procedure to explore a simple model for collaborative filtering, specifically, to predict movie ratings on a (very small) subset of the MovieLens dataset.
For more examples of variational inference in Pyro, please see the course demos on Pyro (VI) and Bayesian linear regression.
Part 1: Loading the data (20 points)
First, load the training data from . The first line of the file is text, listing the names of
M = 10 movies; the remaining lines are comma-separated integers between zero and nine, or (not-a-number),
indicating the value was not observed. The training data file contains a subset of the ratings of N = 200 users
(one line per user). Store the training data in a 200x 10 numpy array called . You may find ’s
function helpful.
Part 2: Model Set-up (30 points)
We will use a Bayesian variant of the SVD low-rank decomposition, or latent space reprsentation, for collaborative filtering described in CS178. If you haven’t seen this and want to read more about it, you can see these data evaluated using non-probabilistic unsupervised learning in the CS178 course notes:
https://canvas.eee.uci.edu/courses/31018/files/folder/Notes
parts 10 (clustering) and 11 (latent space models).
Our model associates a K-dimensional “position” with each user n and each movie m, denoted Un and Vm , respectively. Then, the degree to which user n likes movie m is a function of their dot product, Un.Vm(T) = k Unk Vmk . This will try to place similarly rated movies in a similar direction from the origin, with users that like those movies placed in the same direction, so that the dot product will be large.
A small modification helps the model in practice. Since some users are more positive than others, and some movies are more widely acknowledged as good or bad, we may want to instead predict “relative” preference. For this homework, we will do this very simply, by estimating and subtracting the mean over the movies:
1 movie_avg = np .nanmean(Xtr , axis=0, keepdims=True) # ignore NaN when averaging 2 Xtr -= movie_avg
and then do the same thing for the average over the users (estimate the mean over axis 1, and subtract from X.) (For more sophistication, we could add these as variables in the model and reason over them as well.)
We place simple Gaussian .(0, 1) priors on all the unknown values: the entries in both matrices U, V . Then, we express the probability of our observations as Gaussian with fixed standard deviation:
rnm ~.╱R ; Un.Vm(T) , 0.1 、.
To do so, define your function to generate a collection of M two-dimensional Gaussian random
variables (one for each movie). Your model will be most efficient if you use Pyro’s for indexing, which informs Pyro that the indexed elements are conditionally independent from one another; see the generation of the theta parameters in
http://sli.ics.uci.edu/extras/cs179/Demo%20-%20Bayesian%20Linear%20Regression%20(Pyro).html
for an example. Similarly, define a plate and a list of 2D Gaussian random variables, Un for each user n. Finally, the evidence of our model (training data) is what makes our model interesting, but un-normalized
and thus difficult to reason about _ sample from. Your function should accept a data set (along
with any other parameters you need); add all the passed (non-missing _ NaN) observations to the model, as 1D
V[m]@U[n] |
may be helpful.)
Part 3: Variational Posterior (30 points)
Next, we define the function. The guide function call should take exactly the same parameters as the function, and define exactly the same random variables, except for those whose values are observed.
We will define our guide to be a product of independent, 2D Gaussians, i.e., each Vm and Un are independent in q(.). (Note: they are of course not independent in the true posterior!) More explicitly, define Pyro
sample |
Define an optimizer and a inference engine as in the Bayesian regression example. Then, optimize over
the parameters of your q(.) by iteratively calling . This process can be quite slow; I recommend that you:
(1) Display the current means and variances of your movies, i.e., q(Vm ), after every few iterations. You can use the provided Gaussian plot function at the end of the homework, passing the means and covariances along with colors if desired. I also prefer to clear the plot outputs each time; again see the Bayesian L.R. demo for an example.
(2) Call using a sub-sample of the full data. This can be a collection of ≈ 20 一 50 randomly selected
ratings, or a subsample of ≈ 5 一 10 users’ ratings, depending on how you’ve implemented things. This should speed up each iteration and help your model converge more quickly.
Part 4: Visualizing Uncertainty (20 points)
One of the advantages of a probabilistic model is our ability to gauge our uncertainty in our estimates.
o Re-plot your movies’ estimated posterior means and variances (the same Gaussian ellipses suggested during optimization). You should find that similar movies are located “nearby”, meaning, in the same direction from the origin (0, 0) as one another (since we measure ratings through dot products).
o For users m ∈ {40, 80, 120}, plot the users’ posterior mean and uncertainty on the same plot as the movies. How does your estimated uncertainty about the users’ representation (the covariance) compare to that for the movies? Why do you think this is?
Side note: Uncertainty is useful in many settings – we can use uncertainty in our predictions to help us acquire helpful training data (active learning), or to decide on actions that balance obtaining an immediate reward with exploring other alternatives (e.g., Thompson sampling in multi-armed bandit problems).
Useful helper function:
6 if colors is None : cmap = cm .get_cmap( 'brg ' ); colors = [cmap(i*1./n) for i in range(n)];
8 for i in range(n):
2022-05-16