STAT 7100: Introduction to Advanced Statistical Inference Examples and Illustrations for part B of learning unit 6
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 7100: Introduction to Advanced Statistical Inference
Examples and Illustrations for part B of learning unit 6
Example: Asymptotic statistics for binary sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample of binary data (i.e., zeros and ones) such that θ is the probability of observing the value one on any given trial. The common mean and variance of each Xi is Eθ [Xi] = θ and Vθ [Xi] = θ(1 . θ), and the common PMF is fX,θ (x) = θx (1 . θ)1-x . Thus, the common log-likelihood function of each Xi , and its first derivative, are
l0 (θ; x) = log fX,θ (x) = x log θ + (1 . x) log(1 . θ)
x 1 . x x 1
from which the common Fisher information of each Xi is deduced as
Vθ [Xi] 1
{θ(1 . θ)}2 θ(1 . θ) .
Recall that the MLE for θ is θˆ = ← Xi = . A simple calculation provides
Vθ [lnθˆ] = nVθ [] = θ(1 . θ) = 1
Thus, here, not only is 1/I0 (θ) the asymptotic variance of θˆ, but it is the exact variance of lnθˆ.
The associated log-likelihood function of the sample, X = (X1 , . . . , Xn ), and its first derivative are
n
l(θ; z) = 上 log fX,θ (xi ) = n , log θ + (1 . ) log(1 . θ)、.
i=1
l′ (θ; z) = n ; . 、 = .
Relevant tests statistics for a test of H0 : θ = θ0 versus H0 : θ θ0 are derived as follows.
● The maximum likelihood ratio test statistic has
log ρ(z) = l(θˆ; z) . l(θ0 ; z)
= n , log ; 、 + (1 . ) log ; 、、
● The Wald statistics are
W (X)
Wˆ (X)
● The score statistics are W (X) =
=
Wˆ (X) =
= ↓nI0 (θ0 )(θˆ . θ0 ) =
= /nI0 (θˆ)(θˆ . θ0 ) = . θ0
= ↓θ0 (1 . θ0 )/n , 、
. θ0
↓θ0 (1 . θ0 )/n
= 1 , {
Example: Multinomial sampling. Suppose X = (X 1 , . . . , Xn ) is an IID random sam- ple of multinomial data in categories that define the cells of a table with r rows and c columns; thus, N = rc total cells. Accordingly, let us use array subscripting to index each component random vector, Xi = (Xi,11, . . . , Xi,1c, . . . , Xi,r1, . . . , Xi,rc ), where
Xi,jk = , 0(1)
if the i measurement falls in row j and column k
otherwise
Total counts are summarized in a multinomial random vector Y = (Y11 , . . . , Y1c, . . . , Yr1 , . . . , Yrc ) such that Yjk = ← Xi,jk is the total count of measurements falling in row j and column k. The parameter is 9 = (θ11 , . . . , θ 1c, . . . , θr1, . . . , θrc ), where θjk is the probability that the i’th measurement falls in row j and column k. The data and parameter may be envisioned as two-dimensional tables
row j |
col. k 1 + + + c |
tot. |
1
r |
Y11 + + + Y1c
|
Y1●
Yr ● |
tot. |
|
n |
and
row j |
col. k 1 + + + c |
tot. |
1
r |
θ 11 + + + θ 1c
|
θ 1●
θr ● |
tot. |
|
1 |
Note that, since ←j(r)=1 ←k(c)=1 θjk = 1, only rc . 1 of the parameters are “free.” That is, the values of any subset of rc . 1 of the parameters θjk automatically determine the value of the remaining parameters.
The common PMF each Xi , evaluated at z = (x11 , . . . , x1c, . . . , xr1, . . . , xrc ), is fa ,1 (z) = (j(r)=1 (k(c)=1 θj(x)k(}本). The log-likelihood of the sample X is
n n r c r c
l(θ; X) = 上 fa ,1 (Xi ) =上 上 上Xi,jk log θjk = 上 上Yjk log θjk .
i=1 i=1 j=1 k=1 j=1 k=1
It is not difficult to derive the MLE, 9x = (θˆ11 , . . . , θˆ1c , . . . , θˆr1 , . . . , θˆrc ), from this expression by constrained maximization, under the constraint ←j(r)=1 ←k(c)=1 θjk = 1 ; the result sets each θˆjk = ← Xi,jk = Yjk /n.
Suppose that the null hypothesis is H0 : θjk = θj ● θ●k , which specifies that the parameter, 9, represents the joint distribution of two independent random variables. Under H0 , since ←j(r)=1 θj ● = 1 and←k(c)=1 θ●k = 1, only r + c . 2 = (r . 1) + (c . 1) parameters automatically determine the values of all of the parameters θjk , having counted any subset of r . 1 of the parameters θj ● and any subset of c . 1 of the parameters θ●k . Furthermore, maximizing l(θ; X) subject to the constraints θjk = θj ● θ●k sets each θˆjk = (Yj ● /n)(Y●k /n).
The maximum likelihood ratio test statistic for a test of H0 : θjk = θj ● θ●k versus H1 : not H0 therefore has
log ρ(z) = sup{l(9; z) : 9 e Θ0 u Θ1 } . sup{l(9; z) : 9 e Θ0 }
= 上(r)上(c)Yjk log Yjk
Asymptotic analysis indicates that 2 log ρ(z) is approximately chi-square, where the as- sociated degrees of freedom is the difference in “free” parameters between H0 and H1 . Under H1 , there are rc . 1 free parameters; under H0 there are r + c . 2 ; the difference is ν = (rc . 1) . (r + c . 2) = rc . r . c + 1 = (r . 1)(c . 1).
Example: Gamma sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample such that each Xi ~ gamma(θ, β), for a fixed β. That is, the target of inference is the shape parameter of a gamma distribution, assuming a fixed scale parameter. The common mean and variance of each Xi is Eθ [Xi] = θβ and Vθ [Xi] = θβ2 . The mean formula yields the method of moments estimator θˆMOM = /β, and the variance formula provides that its asymptotic variance is
Vθ [lnθˆMOM ] = nVθ [/β] = Vθ [Xi]/β2 = θ
The common PDF of each Xi is fX,θ (x) = xθ -1 e-x/β /{Γ(θ)βθ }; hence, the common log- likelihood function of each Xi , and its first derivative, are
l0 (θ; Xi ) = log fX,θ (Xi ) = (θ . 1) log Xi . Xi /β . log Γ(θ) . θ log β l0(′)(θ; Xi ) = log Xi . ψ(θ) . log β,
where ψ(z) = Γ′ (z)/Γ(z) is the diagamma function. The log-likelihood function of the sample, and its derivative, are
n n
l(θ; X) = (θ . 1)上 log Xi . 上 Xi /β . n log Γ(θ) . nθ log β
i=1 i=1
n
l′ (θ; X) = 上 log Xi . nψ(θ) . n log β,
i=1
The MLE θˆMLE is therefore the solution to
ψ ╱θˆMLE ← = log Xi . log β,
but cannot be written in closed form. Its asymptotic variance is implied from the common Fisher information of each Xi , which, with some effort, is derived as
I0 (θ) = Vθ [l0(′)(θ; Xi )] = Vθ [log Xi] = ψ′ (θ).
(To see that Vθ [log Xi] = ψ′ (θ), start by deriving the MFG of log Xi .) The asymptotic variance of the MLE is therefore 1/I0 (θ) = 1/ψ′ (θ).
The asymptotic relative efficiency of θˆMLE to θˆMOM is therefore
ARE(θ; θˆMLE , θˆMOM ) = θψ′ (θ).
This is plotted for a range of values as follows.
asymptotic relative efficiency
6
5
4
3
2
1
0
0 |
0.2 |
0.4 |
0.6 |
0.8 |
1 θ |
1.2 |
1.4 |
1.6 |
1.8 |
2 |
This suggests that the method of moments estimator is especially inefficient for small θ ,
but becomes increasingly more efficient as θ becomes larger.
Example: Binary sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample of binary data (i.e., zeros and ones) such that θ is the probability of observing the value
one on any given trial. The common mean and variance of each Xi is Eθ [Xi] = θ and Vθ [Xi] = θ(1 . θ), and the common PMF is fX,θ (x) = θx (1 . θ)1-x .
In a previous example, the relevant Wald statistic for a test of H0 : θ = θ0 versus H1 : θ θ0 was deduced as
Wˆ (X) = /nI0 (θˆ)(θˆ . θ0 ) = . θ0
V十ar θ [] = = (1 . )/n.
The corresponding pivot is Q(θ; X) = ln( . θ0 )/↓(1 . ), whose asymptotic distri- bution is standard Gaussian. By pivoting around around Q(θ; X), one sees that a subset estimator with approximate confidence coefficient 1 . α consists of all θ falling between the values
· z1-α/2/(1 . )/n.
An alternative statistic was deduced as
W (X) = l′ (θ0 ; X) = . θ0
/nI0 (θˆ) ↓θ0 (1 . θ0 )/n ,
whose asymptotic distribution under H0 is also standard Gaussian. The corresponding decision rule for an approximate size-α test to reject H0 when |W (X)| > z1-α/2. An alternative subset estimator may be derived by inverting the hypothesis test based on W (X). This produces
C(X) = ,θ0 : | | < z1-α/2 { ,
which is alternatively characterized as
C(X) = ,θ : ;1 + z 1(2)-α/2、θ 2 . ;2 + z 1(2)-α/2、θ + < 0、.
By the quadratic formula, the interval consists of the θ between the values
二 |(1 . ) + z 1(2)-α/2,} n
· z1-α/2 1 + 1 z2 .
Example: Negative binomial sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample such that the common distribution of each Xi is negative binomial with parameters r and θ. Suppose further than n is not necessarily large, but, instead, θ is small. That is, we are thinking of asymptotic analysis as θ 二 0.
For insight into this setup, consider two properties of the negative binomial distributions that are quickly derived from its MGF, which for each Xi in this example is
MX (t) = , 、r .
2022-04-18