闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 731, 2022, Semester 2

Assignment 2 (7.5%)

Question 1 [10 marks]

This question gets you to derive a few facts about the log-uniform distribution, which is often used as an improper prior for positive quantities (especially scale parameters). In this question, you will use a proper version, with probability density

p(θ) ∝ { (1)

where a is a very small (but positive) lower limit, and b is a very large upper limit.

(a) [3 marks] Find the normalising constant, so that the above definition can be written with

an equality sign instead of a proportionality sign.

(b) [3 marks] Let ϕ = log θ . Show that the probability density p(ϕ) is uniform, and find its

limits.

(c) [4 marks] Show that the prior probability that θ is between some lower limit ℓ and upper limit 2ℓ does not depend on ℓ, as long as ℓ is between a and b/2. The log-uniform distribution is the only one with this property.

Question 2 [15 marks]

Suppose that causes of death are reviewed in detail for a city in the US for a single year. It is found that 3 persons, out of a population of 200,000, died of asthma, giving a crude estimated asthma mortality rate in the city of 1.5 per 100,000 persons per year. A Poisson sampling model is often used for data of this form. Let λ represent the true underlying long-term asthma mortality rate in the city (measured in cases per 100,000 persons per year). Reviews of asthma mortality rates around the world suggest that mortality rates above 1.5 per 100,000 people are rare in Western countries, with typical asthma mortality rates around 0.6 per 100,000.

(a) [3 marks] Find a Gamma distribution that has expectation 0.6 and standard deviation

0.6, to be the prior for λ .

(b) [2 marks] Write down the sampling distribution.

(d) [2 marks] Find a 95% central credible interval for λ .

(e) [5 marks] Find and plot the posterior predictive distribution for the number of asthma

deaths in the next year.

Question 3 [20 marks]

In this question you will derive an interesting result for the problem of‘sinusoidal regression’, which is important in econometrics, physics, and astronomy. If you like this result, check out the book I learned it from for many more results like this. The late David MacKay described the book as being full of‘macho integration’. This question involves algebra quite similar to Assignment 1, but I will give better hints this time, so you don’t have to spend too long on it (don’t expect it to be quick — it’s just more guided, so you don’t waste time going down paths that don’t lead to the final result).

Suppose there is an unknown signal µ(t) being measured over time, and the true signal is of the form

µ(t) = Asin ( ) + B cos ( ) . (2)

A and B are amplitude parameters (the overall amplitude being ^A2 + B2 ), and T is the period of the oscillations, i.e., the time taken for the signal to complete a full cycle.

The data consists of N noisy measurements y = {y1 ,y2 , ...,yN } taken at times t = {t1 ,t2 , ...,tN }, with sampling distribution

yi | A,B,T ∼ Normal‘µ(ti ;A,B,T),σ2 ) . (3)

See Figure 1 for some example simulated data, which is also provided in sinusoid .csv on Canvas.

—2

—4

True Signal u(t)

Noisy Data {yi }

Figure 1: Noisy measurements of a sinusoidal signal. The data is in sinusoid .csv, but only I know the true values of the model parameters, which produced the blue curve.

The three unknown parameters are A,B,T, and we will assume σ is known and its value is 1. Assume improper uniform priors for A, B and T, and that the priors are all independent, i.e., p(A,B,T) ∝ 1. In most applications of models like this, the period T is of more interest than the amplitudes A and B .

(a) [2 marks] Write down the likelihood function p(y |A,B,T), and simplify it a little bit if

you can. Make sure to move the product inside the exponential so it becomes a sum.

(b) [1 mark] Find an expression for the joint posterior distribution p(A,B,T |y).

p(A,B,T |y) ∝ exp(QA + QB )

where QA and QB (Q for Quadratic) are defined as

QA = − A2 + A

QB = − B2 + B .

Here, y · s is a discrete dot product of the data vector y = {y1 , ...,yN } with the sinusoidal function evaluated at the same times s = {sin(2πt1 /T), ..., sin(2πtN /T)}:

y · s =对 yi sin(2πti /T), (7)

i=1

and y · c is the same thing but with the cosine function. You may use the fact that, for‘most’datasets, the observation times will be such that s · c ≈ 0, s · s ≈ N/2, and c · c ≈ N/2.

(d) [5 marks] Marginalise out A and B to find an expression for the marginal posterior distribution for the period T. Use the result in the Appendix to do this.

(e) [2 marks] A traditional way of estimating the period is to compute the‘periodogram’, a

function of T defined by

Periodogram(T) = (y · s)2 + (y · c)2 , (8)

and to see where it peaks.

If you haven’t done so already, rewrite your answer to (d) so the periodogram itself appears in the expression for the marginal posterior density for T.

(f) [5 marks] For the dataset given on Canvas in the file sinusoid .csv, compute and plot

the periodogram from T = 0.01 to T = 10, and the posterior distribution for T, on the same set of axes. You may arbitrary re-scale either of these functions so that the shapes of both of them can be clearly seen.

Appendix: Gaussian Integral

Here is the integral you need for Question 2 part (c):

\− exp (ax2 + bx)dx = exp ( − )4− ,

where a must be negative.

Here is a derivation of it:

\− exp (ax2 + bx)dx = \− exp (a (x2 + x)) dx = \− exp (a (x2 + x + − )) dx = \− exp (a (x + )2 − ) dx = exp ( − ) \− exp (a (x + )2 ) dx = exp ( − ) \− exp (a (x + )2 ) dx