BCS/CSC 229: Computer Models of Human Perception and Cognition Homework Assignment #2

发布时间：2022-10-26

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

BCS/CSC 229: Computer Models of Human Perception and Cognition

Homework Assignment #2

Instructions: Answer all questions below. Include all requested calculations and graphs. Also include the Python code that you wrote to answer the questions. When writing text or equations, please write NEATLY!

(0) (Part A) At the top of the document that you turn in, place your name and the date. (Part B) Next, please take the honor pledge. That is, write (by hand using a pen): “I aﬃrm that I have not given or received any unauthorized help on this assignment, and that this work is my own.” Then sign your name.

(1) [WARNING: This problem is mathematically challenging. Don’t be surprised if you struggle with it. Indeed, it may be smart to ﬁrst work on the other homework problems, and then return to this problem if time permits.] (Adapted from Problem 2.4 from the draft of the textbook by Ma, Kording, and Goldreich) Many Bayesian inference problems involve a product of two or more Gaussians. A convenient property of Gaussians is that their product is also Gaussian. In this problem, we will lead you through an example to derive this property yourself. Consider an observer who infers a stimulus s from a measurement -. Suppose that the measurement distribution p(-ls) (this is the likelihood function) is a Gaussian distribution with standard deviation 口 , and the stimulus distribution (this is the prior distribution) is a Gaussian with mean u and standard deviation 口s .

(a) Write down the equations for p(-ls) and p(s).

(b) Use Bayes’ rule to write down the equation for the posterior p(sl-). Substitute p(-ls) and p(s), but do not simplify.

The numerator is a product of two Gaussians. The denominator p(-) is a normalization factor that ensures that the integral of the posterior distribution equals 1. For now, we will

ignore the denominator and focus on the numerator.

(d) Expand the two quadratic terms in the exponent.

(e) Rewrite the exponent to the form aw2 + bw + c (where a, b, and c are constants independent of w).

(f) Show that any quadratic function of the form aw2 + bw + c can be written as:

a zw + 、2 + c ( .

This operation is known as “completing the square” .

(g) Rewrite your expression obtained in (e) by completing the square. (h) Apply the rule eA eB = eA+B to rewrite this into the form

(s −ucombined )2

eZ e − 2oc(2)ombined .

Express ucombined and 口combined in terms of -, 口 , u, and 口s .

(i) Why is ucombined the same as the maximum-a-aposteriori (MAP) estimate of the stim- ulus (i.e., the w that maximizes the posterior distribution 3(wl-))?

(j) Recall that 3(wl-) is a distribution and that its integral should therefore be equal to 1. However, the expression that you obtained in (e) is not properly normalized because we ignored 3(-). Modify the expression such it is properly normalized, without using 3(-) (Hint: Does eZ depend on w?)

(2) (Adapted from Problem 2.12 from the draft of the textbook by Ma, Kording, and Gol- dreich) An observer infers a stimulus w from a measurement -. Let’s say that on a particular trial, the measurement is - = 30. The measurement distribution 3(-lw) (this is the likeli- hood function) is Gaussian with standard deviation 口 = 5. Assume a Gaussian stimulus distribution 3(w) (this is the prior distribution) with mean 20 and standard deviation 4. We are now going to calculate the posterior distribution 3(wl-) using Python.

(a) Deﬁne a vector of possible w-values: 0, 0.2, 0.4, . . . , 40.

(b) Compute the likelihood function and the prior on this vector of values of w. [Hint: The values of the prior distribution will not sum to one (instead, they should sum to 1/stepsize where stepsize = 0.2). That is because we are approximating a continuous distribution by a discrete distribution. A similar comment applies to the likelihood function, though keep in mind that the likelihood function is not a distribution, and thus its values do not need to sum to one.]

(c) Multiply the likelihood and the prior. In Python, elementwise multiplication of two vectors can be achieved using the “*” command.

(d) Divide this product by its sum over all s (normalization step).

(e) Convert this posterior probability mass function into a probability density function by dividing by the step size you used in your vector of s-values (e.g., 0.2).

(f) Plot the likelihood, prior, and posterior in the same plot. Is the posterior wider or narrower than the likelihood and prior? Do you expect this based on the equations we discussed?

(g) Change the standard deviation of the measurement distribution to a very large value. What happens to the posterior? Can you explain this?

(h) Change the standard deviation of the measurement distribution to a very small value. What happends to the posterior? Can you explain this?

(3) (Adapted from Problem 2.13 from the draft of the textbook by Ma, Kording, and Gol- dreich) Repeat Question (2), but instead of using a single value of the measurement -, start with a ﬁxed value of s = 10. From this value of s, draw 10 values of - from the measurement distribution. (Although you know the true value of s, pretend that you don’t know this value. Your goal is to infer the distribution of s based on each individual value of -. You will do this inference 10 times.) You should observe that, from trial to trial, the likelihood function and posterior probability density function “jump around” . Observe how the posterior shifts under the inﬂuence of the “jumping” likelihood function and stationary prior. Explain.

(4) (Adapted from Problem 2.14 from the draft of the textbook by Ma, Kording, and Goldre- ich) Continuing from Questions (2) and (3), generate a distribution of maximum-a-posteriori (MAP) and maximum likelihood (ML) estimates by:

(a) drawing an s from the stimulus distribution;

(b) drawing a single - from the measurement distribution, and calculating the posterior distribution.

(c) For each of 1000 repetitions of (a) and (b), plot the MAP estimate (y-axis) against the true stimulus s (x-axis). On a separate graph, plot the MLE (i.e., measurement -) against the true stimulus s.

(d) Repeat (a), (b), and (c) using diﬀerent values of the measurement or noise standard deviation relative to prior standard deviation. When the noise standard deviation is very small, the MAP and MLE plots should look the same. Why? When the noise standard deviation is very large, the MAP plot looks ﬂat, whereas the MLE plot looks very scattered. Why?

(5) (Adapted from Problem 3.7 from the draft of the textbook by Ma, Kording, and Gol- dreich) In Chapters 2 and 3 (of the Ma, Kording, and Goldreich textbook), we were able to derive analytical expressions for the posterior distribution. For more complex psychophysical tasks, however, analytical solutions often do not exist. In such a case, we can use numeri- cal methods to approximate the distribution of interest. To get some familiarity with this method, we will reconsider the cue combination experiment described in this chapter, but we will now compute the distribution of MAP estimates using numerical methods. We assume that the experimenter introduces a cue conﬂict between the auditory and the visual stimuli: sA = 5 and sv = 10. The standard deviation of the auditory and of the visual noise is 口A = 2 and 口v = 1, respectively. We assume a ﬂat (uniform) prior distribution over s.

(a) Randomly draw an auditory measurement -A and a visual measurement -v from their respective distributions. (It’s okay if a measurement has a negative value.)

(b) Plot the corresponding elementary likelihood functions, p(-A ls) and p(-v ls), in one ﬁgure.

(c) Calculate the combined likelihood function, p(-A , -v ls), by numerically multiplying the elementary likelihood functions in Python. Plot this function.

(d) Calculate the posterior distribution by normalizing the combined likelihood function. Plot this distribution in the same ﬁgure as the likelihood functions.

(e) Use Python to ﬁnd the MAP estimate of s (i.e., the value of s at which the posterior distribution is maximal).

(f) Compare with the MAP estimate of s computed from Eq. (3.3) using the measure- ments drawn in (a). For convenience, here is Eq. (3.3):

sˆMAP =

(g) In the above, we simulated a single trial and computed the observer’s MAP estimate of s, given the noisy measurements on that trial. If an analytical solution does not exist for the distribution of MAP estimates, we can repeat the above procedure many times to approximate this distribution. Here, we practice this method even though an analytical solution is available in this case. Draw 100 pairs (-A , -v ) and numerically compute the observer’s MAP estimate of s for each pair as in (e).

(h) Compute the mean of the MAP estimates obtained in (g) and compare with the mean estimate predicted using Eq. (3.5). For convenience, here is Eq. (3.5):

zA =

zv =

)sˆ〉 =

口

口v

+ 口

zA sA + zv sv

(i) Make a histogram of the MAP estimates (in Python, use the “numpy.histogram” function).

(j) Relative auditory bias is deﬁned as the mean MAP estimate minus the true auditory stimulus, divided by the true visual stimulus minus the true auditory stimulus. Compute

relative auditory bias for your set of estimates.