关键词 > STAT3001/STAT7301

STAT3001/STAT7301 Mathematical Statistics Semester One Final Examinations, 2019

发布时间：2023-06-10

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

School of Mathematics & Physics

EXAMINATION

Semester One Final Examinations, 2019

STAT3001/STAT7301 Mathematical Statistics

1. Let T denote an estimator of an unknown parameter θ based on a sample of size n from a distribution Fθ with density function f (x; θ).

(i) Show that the mean squared error (MSE) of T can be decomposed as

MSE(T) = var(T) + {bias(T)}2 . [1 mark ]

(ii) Deﬁne what is meant by T being a consistent estimator of θ . [1 mark ]

(iii) Show that if var(T) and bias(T) each tend to zero as n tends to inﬁnity, then T is a consistent estimator of θ . [2 marks]

(iv) Assuming regularity conditions hold, show that the expectation of the Score Statis- tic is zero,

E{∂ log L(θ)/∂θ} = 0,

and hence show that its variance is equal to s (θ), the (Fisher) expected informa- tion about θ ,

var{∂ log L(θ)/∂θ} = s (θ). [1 mark ]

(v) Assuming regularity conditions hold, show that

s (θ) = E{I(θ)},

where

I(θ) = -∂2 log L(θ)/∂θ2 . [2 marks]

(vi) Let U be an unbiased estimator of θ . Under regularity conditions, show that the Cram´er-Rao lower bound on the variance of U is given by

var(U) > 1/s (θ). [3 marks]

2. Suppose that the random variable has a two-component normal mixture density

f (y; θ) = p1 φ(y; µ1 , σ 1(2)) + p2 φ(y; µ2 , σ2(2)). (1)

where p2 = 1 - p1 and p1 lies in the interval [0, 1] and

θ = (p1 , µ 1 , µ2 , σ 1(2), σ2(2))T ,

Here φ(y; µ, σ2 ) denotes the univariate normal density with mean µ and variance σ2 .

One way to conceptualize this mixture density is that the observation y has been made on a random variable Y that comes from Class i with probability pi in which it has

density φ(y; µi , σi(2)) (i = 1, 2).

Let y1 , . . . , yn denote an observed sample of size n from the mixture density (1). In order to apply the EM algorithm to the calculation of the maximum likelihood estimate of θ, the observed data-vector

g = (y1 , . . . , yn )T

is conceptualized as being incomplete and the complete-data vector α is deﬁned as

α = (gT , 之T )T ,

where 之 = (z1(T), . . . , zn(T))T and zj is deﬁned to be one or zero according as yj arose, or

did not arise, from Class 1 (j = 1, . . . , n).

(i) Write down the incomplete-data log likelihood function log L(θ).	[1 mark ]
(ii) Write down the complete-data log likelihood function log Lc (θ).	[1 mark ]

(iii) On applying the EM algorithm in the above EM framework, calculate the so-called Q-function on the (k + 1)th iteration of the E-step, where

Q(θ , θ (k)) = Eθ (k) {log Lc (θ) Ig},

showing that it is a linear function of the posterior probabilities τi (yj ; θ(k)), where

τi (yj ; θ(k)) = pk) φ(yj ; µk), σk)2 )

is the posterior probability that the jth observation arose from Class i given yj (i = 1, 2; j = 1, . . . , n). [3 marks]

(iv) State how θ(k+1) is deﬁned on the M-step. [2 marks]

(v) Derive the equation for pk+1) . [3 marks]

3. Let Xi1, . . . , Xin i denote a random sample of size ni from a distribution assumed to be normal with mean µi and a common variance σ2 (i = 1, 2). It is assumed that these two random samples are independent of each other.

(i) Write down the likelihood equation and give its solution, yielding the ML estimates of µ1 , µ2 , and σ2. Note you are not required to verify that the solution is a (global) maximizer of the likelihood function. [2 marks]

(ii) Consider the test of

H0 : µ1 = µ2 vs. H1 : µ1 µ2 ,

on the basis of the test statistic,

X 1 - X2

T = ,

where

s2 = ˆ(σ)2

If Tobs denotes the observed value of this test statistic T, deﬁne the P-value. [2 marks]

(iii) Explain why there is no uniformly most powerful test (UMP) of the hypothesis H0 versus H1 . [2 marks]

(iv) Derive the likelihood ratio test of H0 versus H1 , showing that the rejection of H0 for suﬃciently small values of the likelihood ratio λ is equivalent to the rejection of H0 for suﬃciently large values of T2 . [4 marks]

4. Let α = (x1 , x2 , . . . , xn ) be an iid sample of size n > 1 from a Uniform[0, θ] distribution

with pdf

f (xI θ) =

x e [0, θ] ,

where θ > 0 is an unknown parameter.

Consider a prior distribution for θ given by θ ~ Pareto(M, k) with pdf

kM k

θ k+1 0 ,

, θ > M,

otherwise

where M > 0 is a location hyperparameter and k > 1 is a shape hyperparameter.

(i) Write down the likelihood of the data α given the parameter θ . [1 mark ]

(ii) Show that the posterior pdf of θ given α = (x1 , x2 , . . . , xn ) is

f (θI α) x , θ > max{M, x1 , x2 , . . . , xn } ,

and identify its distribution. [3 marks]

(iii) Is the prior distribution θ ~ Pareto(M, k) conjugate for this problem? [1 mark ]

(iv) Show that the mean of any Pareto(B, a) distribution with location parameter B > 0 and shape parameter a > 1 is given by aB/(a - 1). [2 marks]

(v) Using parts (ii), (iii) and (iv), or otherwise, show that the posterior mean of θ given α is

E(θI α) = max{M, x1 , x2 , . . . , xn } . [1 mark ]

(vi) A well-known bias-corrected estimator for θ (Gibbons, 1974) is given by

θ(ˆ) = max{x1 , x2 , . . . , , xn } .

By comparing the expression from part (v) toθ(ˆ), or otherwise, give an interpretation of the eﬀect of the hyperparameters M and k on the posterior mean as an estimator for θ. (Hint: You may use the concept of equivalent prior observations.) [2 marks]

(vii) Let θ* be the true value of θ that is generating the data. What happens to the posterior mean of θ as sample size n → o if

a. the chosen hyperparameter M value happens to be bigger than θ* ? b. the chosen hyperparameter M value happens to be less than θ* ? [2 marks]

5. Consider a bivariate distribution with joint pdf given by

f (x, y) = c exp{-xy - 2x - 3y} , x > 0, y > 0,

where c is a normalizing constant.

(i) Show that the conditional pdf f (xI y) is given by f (xI y) x exp{-x(y +2)} . What distribution does this correspond to? [2 marks]

Similarly, it can also be shown that the conditional pdf f (yI x) is given by

f (yI x) x exp{-y(x + 3)} .

(ii) Using the above results, or otherwise, describe how you could sample from the joint distribution f (x, y). [4 marks]

6. An online poll of 61 viewers1 on their opinions of Season 8 of the TV series “Game of Thrones” returned the following counts:

Positive Neutral Negative Total

23 7 31 61

Let p1 , p2 and p3 = 1 - p1 - p2 be the underlying proportion of all viewers that have a positive, neutral and negative opinion of Season 8 of Game of Thrones, respectively.

(i) Assuming the data come from a multinomial distribution, show that the prior

(p1 , p2 ) ~ Dirichlet(α1 , α2 , α3 ) with hyperparameters α1 , α2 , α3 > 0 ,

is conjugate for this problem. [Recall: the joint pdf of (p1 , p2 ) ~ Dirichlet(α1 , α2 , α3 ) is

f (p1 , p2 ) x p1(α)1 -1p2(α)2 -1 (1 - p1 - p2 )α3 -1 , 0 sp1 , p2 s 1.] [3 marks]

(ii) It is given to you that

l0 1-q2 q1(λ)1 -1 (1 - q1 - q2 )λ3 -1 dq1 x (1 - q2 )λ1 +λ3 -1 .

Use this to show that if (q1 , q2 ) ~ Dirichlet(λ1 , λ2 , λ3 ) then the marginal distribu- tion of q2 is Beta(λ2 , λ 1 + λ3 ). [3 marks]

(iii) Using part (ii), or otherwise, show that the marginal posterior for p2 is

p2 Idata ~ Beta(α2 + 7, α1 + α3 + 23 + 31) . [3 marks]

(iv) The posterior mean of p2 is

E(p2 I data) =

α2 + 7

α 1 + α2 + α3 + 61

Give an interpretation of this posterior mean by completing the following sentence:

“The eﬀect of the prior on the posterior mean of p2 is like . . . ” [2 marks]

(v) In light of how the data were collected, or otherwise, argue why the posterior mean in (iv) may lead to a better estimate of the underlying proportion of neutral viewers than the frequentist estimate ofˆ(p)2 = 7/61 = 11%. [1 mark ]