关键词 > STAT3001/STAT7301
STAT3001/STAT7301 Mathematical Statistics Semester One Final Examinations, 2019
发布时间:2023-06-10
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
School of Mathematics & Physics
EXAMINATION
Semester One Final Examinations, 2019
STAT3001/STAT7301 Mathematical Statistics
1. Let T denote an estimator of an unknown parameter θ based on a sample of size n from a distribution Fθ with density function f (x; θ).
(i) Show that the mean squared error (MSE) of T can be decomposed as
MSE(T) = var(T) + {bias(T)}2 . [1 mark ]
(ii) Define what is meant by T being a consistent estimator of θ . [1 mark ]
(iii) Show that if var(T) and bias(T) each tend to zero as n tends to infinity, then T is a consistent estimator of θ . [2 marks]
(iv) Assuming regularity conditions hold, show that the expectation of the Score Statis- tic is zero,
E{∂ log L(θ)/∂θ} = 0,
and hence show that its variance is equal to s (θ), the (Fisher) expected informa- tion about θ ,
var{∂ log L(θ)/∂θ} = s (θ). [1 mark ]
(v) Assuming regularity conditions hold, show that
s (θ) = E{I(θ)},
where
I(θ) = -∂2 log L(θ)/∂θ2 . [2 marks]
(vi) Let U be an unbiased estimator of θ . Under regularity conditions, show that the Cram´er-Rao lower bound on the variance of U is given by
var(U) > 1/s (θ). [3 marks]
2. Suppose that the random variable has a two-component normal mixture density
f (y; θ) = p1 φ(y; µ1 , σ 1(2)) + p2 φ(y; µ2 , σ2(2)). (1)
where p2 = 1 - p1 and p1 lies in the interval [0, 1] and
θ = (p1 , µ 1 , µ2 , σ 1(2), σ2(2))T ,
Here φ(y; µ, σ2 ) denotes the univariate normal density with mean µ and variance σ2 .
One way to conceptualize this mixture density is that the observation y has been made on a random variable Y that comes from Class i with probability pi in which it has
density φ(y; µi , σi(2)) (i = 1, 2).
Let y1 , . . . , yn denote an observed sample of size n from the mixture density (1). In order to apply the EM algorithm to the calculation of the maximum likelihood estimate of θ, the observed data-vector
g = (y1 , . . . , yn )T
is conceptualized as being incomplete and the complete-data vector α is defined as
α = (gT , 之T )T ,
where 之 = (z1(T), . . . , zn(T))T and zj is defined to be one or zero according as yj arose, or
did not arise, from Class 1 (j = 1, . . . , n).
(i) Write down the incomplete-data log likelihood function log L(θ). |
[1 mark ] |
(ii) Write down the complete-data log likelihood function log Lc (θ). |
[1 mark ] |
(iii) On applying the EM algorithm in the above EM framework, calculate the so-called Q-function on the (k + 1)th iteration of the E-step, where
Q(θ , θ (k)) = Eθ (k) {log Lc (θ) Ig},
showing that it is a linear function of the posterior probabilities τi (yj ; θ(k)), where
τi (yj ; θ(k)) = p
k) φ(yj ; µ
k), σ
k)2 )
is the posterior probability that the jth observation arose from Class i given yj (i = 1, 2; j = 1, . . . , n). [3 marks]
(iv) State how θ(k+1) is defined on the M-step. [2 marks]
(v) Derive the equation for pk+1) . [3 marks]
3. Let Xi1, . . . , Xin i denote a random sample of size ni from a distribution assumed to be normal with mean µi and a common variance σ2 (i = 1, 2). It is assumed that these two random samples are independent of each other.
(i) Write down the likelihood equation and give its solution, yielding the ML estimates of µ1 , µ2 , and σ2. Note you are not required to verify that the solution is a (global) maximizer of the likelihood function. [2 marks]
(ii) Consider the test of
H0 : µ1 = µ2 vs. H1 : µ1 µ2 ,
on the basis of the test statistic,
X 1 - X2
T =
,
where
s2 = ˆ(σ)2
=
If Tobs denotes the observed value of this test statistic T, define the P-value. [2 marks]
(iii) Explain why there is no uniformly most powerful test (UMP) of the hypothesis H0 versus H1 . [2 marks]
(iv) Derive the likelihood ratio test of H0 versus H1 , showing that the rejection of H0 for sufficiently small values of the likelihood ratio λ is equivalent to the rejection of H0 for sufficiently large values of T2 . [4 marks]
4. Let α = (x1 , x2 , . . . , xn ) be an iid sample of size n > 1 from a Uniform[0, θ] distribution
with pdf
f (xI θ) =
x e [0, θ] ,
where θ > 0 is an unknown parameter.
Consider a prior distribution for θ given by θ ~ Pareto(M, k) with pdf
kM k |
θ k+1 0 , |
, θ > M,
otherwise
,
where M > 0 is a location hyperparameter and k > 1 is a shape hyperparameter.
(i) Write down the likelihood of the data α given the parameter θ . [1 mark ]
(ii) Show that the posterior pdf of θ given α = (x1 , x2 , . . . , xn ) is
f (θI α) x , θ > max{M, x1 , x2 , . . . , xn } ,
and identify its distribution. [3 marks]
(iii) Is the prior distribution θ ~ Pareto(M, k) conjugate for this problem? [1 mark ]
(iv) Show that the mean of any Pareto(B, a) distribution with location parameter B > 0 and shape parameter a > 1 is given by aB/(a - 1). [2 marks]
(v) Using parts (ii), (iii) and (iv), or otherwise, show that the posterior mean of θ given α is
E(θI α) = max{M, x1 , x2 , . . . , xn } . [1 mark ]
(vi) A well-known bias-corrected estimator for θ (Gibbons, 1974) is given by
θ(ˆ) = max{x1 , x2 , . . . , , xn } .
By comparing the expression from part (v) toθ(ˆ), or otherwise, give an interpretation of the effect of the hyperparameters M and k on the posterior mean as an estimator for θ. (Hint: You may use the concept of equivalent prior observations.) [2 marks]
(vii) Let θ* be the true value of θ that is generating the data. What happens to the posterior mean of θ as sample size n → o if
a. the chosen hyperparameter M value happens to be bigger than θ* ? b. the chosen hyperparameter M value happens to be less than θ* ? [2 marks]
5. Consider a bivariate distribution with joint pdf given by
f (x, y) = c exp{-xy - 2x - 3y} , x > 0, y > 0,
where c is a normalizing constant.
(i) Show that the conditional pdf f (xI y) is given by f (xI y) x exp{-x(y +2)} . What distribution does this correspond to? [2 marks]
Similarly, it can also be shown that the conditional pdf f (yI x) is given by
f (yI x) x exp{-y(x + 3)} .
(ii) Using the above results, or otherwise, describe how you could sample from the joint distribution f (x, y). [4 marks]
6. An online poll of 61 viewers1 on their opinions of Season 8 of the TV series “Game of Thrones” returned the following counts:
Positive Neutral Negative Total
23 7 31 61
Let p1 , p2 and p3 = 1 - p1 - p2 be the underlying proportion of all viewers that have a positive, neutral and negative opinion of Season 8 of Game of Thrones, respectively.
(i) Assuming the data come from a multinomial distribution, show that the prior
(p1 , p2 ) ~ Dirichlet(α1 , α2 , α3 ) with hyperparameters α1 , α2 , α3 > 0 ,
is conjugate for this problem. [Recall: the joint pdf of (p1 , p2 ) ~ Dirichlet(α1 , α2 , α3 ) is
f (p1 , p2 ) x p1(α)1 -1p2(α)2 -1 (1 - p1 - p2 )α3 -1 , 0 sp1 , p2 s 1.] [3 marks]
(ii) It is given to you that
l0 1-q2 q1(λ)1 -1 (1 - q1 - q2 )λ3 -1 dq1 x (1 - q2 )λ1 +λ3 -1 .
Use this to show that if (q1 , q2 ) ~ Dirichlet(λ1 , λ2 , λ3 ) then the marginal distribu- tion of q2 is Beta(λ2 , λ 1 + λ3 ). [3 marks]
(iii) Using part (ii), or otherwise, show that the marginal posterior for p2 is
p2 Idata ~ Beta(α2 + 7, α1 + α3 + 23 + 31) . [3 marks]
(iv) The posterior mean of p2 is
E(p2 I data) =
α2 + 7 |
|
Give an interpretation of this posterior mean by completing the following sentence:
“The effect of the prior on the posterior mean of p2 is like . . . ” [2 marks]
(v) In light of how the data were collected, or otherwise, argue why the posterior mean in (iv) may lead to a better estimate of the underlying proportion of neutral viewers than the frequentist estimate ofˆ(p)2 = 7/61 = 11%. [1 mark ]