闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 7100: Introduction to Advanced Statistical Inference

Examples and Illustrations for part B of learning unit 6

Example: Asymptotic statistics for binary sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample of binary data (i.e., zeros and ones) such that θ is the probability of observing the value one on any given trial. The common mean and variance of each Xi is Eθ [Xi] = θ and Vθ [Xi] = θ(1 . θ), and the common PMF is fX,θ (x) = θx (1 . θ)1-x . Thus, the common log-likelihood function of each Xi , and its ﬁrst derivative, are

l0 (θ; x) = log fX,θ (x) = x log θ + (1 . x) log(1 . θ)

x 1 . x x 1

from which the common Fisher information of each Xi is deduced as

Vθ [Xi] 1

{θ(1 . θ)}2 θ(1 . θ) .

Recall that the MLE for θ is θˆ = ← Xi = . A simple calculation provides

Vθ [lnθˆ] = nVθ [] = θ(1 . θ) = 1

Thus, here, not only is 1/I0 (θ) the asymptotic variance of θˆ, but it is the exact variance of lnθˆ.

The associated log-likelihood function of the sample, X = (X1 , . . . , Xn ), and its ﬁrst derivative are

l(θ; z) = 上 log fX,θ (xi ) = n , log θ + (1 . ) log(1 . θ)、.

i=1

l′ (θ; z) = n ; . 、 = .

Relevant tests statistics for a test of H0 : θ = θ0 versus H0 : θ θ0 are derived as follows.

● The maximum likelihood ratio test statistic has

log ρ(z) = l(θˆ; z) . l(θ0 ; z)

= n , log ; 、 + (1 . ) log ; 、、

● The Wald statistics are

W (X)

Wˆ (X)

● The score statistics are W (X) =

Wˆ (X) =

= ↓nI0 (θ0 )(θˆ . θ0 ) =

= /nI0 (θˆ)(θˆ . θ0 ) = . θ0

= ↓θ0 (1 . θ0 )/n , 、

. θ0

↓θ0 (1 . θ0 )/n

= 1 , {

Example: Multinomial sampling. Suppose X = (X 1 , . . . , Xn ) is an IID random sam- ple of multinomial data in categories that deﬁne the cells of a table with r rows and c columns; thus, N = rc total cells. Accordingly, let us use array subscripting to index each component random vector, Xi = (Xi,11, . . . , Xi,1c, . . . , Xi,r1, . . . , Xi,rc ), where

Xi,jk = , 0(1)

if the i measurement falls in row j and column k

otherwise

Total counts are summarized in a multinomial random vector Y = (Y11 , . . . , Y1c, . . . , Yr1 , . . . , Yrc ) such that Yjk = ← Xi,jk is the total count of measurements falling in row j and column k. The parameter is 9 = (θ11 , . . . , θ 1c, . . . , θr1, . . . , θrc ), where θjk is the probability that the i’th measurement falls in row j and column k. The data and parameter may be envisioned as two-dimensional tables

row j

col. k 1 + + + c

tot.

Y11 + + + Y1c

Y1●

Yr ●

tot.

and

row j

col. k 1 + + + c

tot.

θ 11 + + + θ 1c

θ 1●

θr ●

tot.

Note that, since ←j(r)=1 ←k(c)=1 θjk = 1, only rc . 1 of the parameters are “free.” That is, the values of any subset of rc . 1 of the parameters θjk automatically determine the value of the remaining parameters.

The common PMF each Xi , evaluated at z = (x11 , . . . , x1c, . . . , xr1, . . . , xrc ), is fa ,1 (z) = (j(r)=1 (k(c)=1 θj(x)k(}本). The log-likelihood of the sample X is

n n r c r c

l(θ; X) = 上 fa ,1 (Xi ) =上上上Xi,jk log θjk = 上上Yjk log θjk .

i=1 i=1 j=1 k=1 j=1 k=1

It is not difﬁcult to derive the MLE, 9x = (θˆ11 , . . . , θˆ1c , . . . , θˆr1 , . . . , θˆrc ), from this expression by constrained maximization, under the constraint ←j(r)=1 ←k(c)=1 θjk = 1 ; the result sets each θˆjk = ← Xi,jk = Yjk /n.

Suppose that the null hypothesis is H0 : θjk = θj ● θ●k , which speciﬁes that the parameter, 9, represents the joint distribution of two independent random variables. Under H0 , since ←j(r)=1 θj ● = 1 and←k(c)=1 θ●k = 1, only r + c . 2 = (r . 1) + (c . 1) parameters automatically determine the values of all of the parameters θjk , having counted any subset of r . 1 of the parameters θj ● and any subset of c . 1 of the parameters θ●k . Furthermore, maximizing l(θ; X) subject to the constraints θjk = θj ● θ●k sets each θˆjk = (Yj ● /n)(Y●k /n).

The maximum likelihood ratio test statistic for a test of H0 : θjk = θj ● θ●k versus H1 : not H0 therefore has

log ρ(z) = sup{l(9; z) : 9 e Θ0 u Θ1 } . sup{l(9; z) : 9 e Θ0 }

= 上(r)上(c)Yjk log Yjk

Asymptotic analysis indicates that 2 log ρ(z) is approximately chi-square, where the as- sociated degrees of freedom is the difference in “free” parameters between H0 and H1 . Under H1 , there are rc . 1 free parameters; under H0 there are r + c . 2 ; the difference is ν = (rc . 1) . (r + c . 2) = rc . r . c + 1 = (r . 1)(c . 1).

Example: Gamma sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample such that each Xi ~ gamma(θ, β), for a ﬁxed β. That is, the target of inference is the shape parameter of a gamma distribution, assuming a ﬁxed scale parameter. The common mean and variance of each Xi is Eθ [Xi] = θβ and Vθ [Xi] = θβ2 . The mean formula yields the method of moments estimator θˆMOM = /β, and the variance formula provides that its asymptotic variance is

Vθ [lnθˆMOM ] = nVθ [/β] = Vθ [Xi]/β2 = θ

The common PDF of each Xi is fX,θ (x) = xθ -1 e-x/β /{Γ(θ)βθ }; hence, the common log- likelihood function of each Xi , and its ﬁrst derivative, are

l0 (θ; Xi ) = log fX,θ (Xi ) = (θ . 1) log Xi . Xi /β . log Γ(θ) . θ log β l0(′)(θ; Xi ) = log Xi . ψ(θ) . log β,

where ψ(z) = Γ′ (z)/Γ(z) is the diagamma function. The log-likelihood function of the sample, and its derivative, are

n n

l(θ; X) = (θ . 1)上 log Xi . 上 Xi /β . n log Γ(θ) . nθ log β

i=1 i=1

l′ (θ; X) = 上 log Xi . nψ(θ) . n log β,

i=1

The MLE θˆMLE is therefore the solution to

ψ ╱θˆMLE ← = log Xi . log β,

but cannot be written in closed form. Its asymptotic variance is implied from the common Fisher information of each Xi , which, with some effort, is derived as

I0 (θ) = Vθ [l0(′)(θ; Xi )] = Vθ [log Xi] = ψ′ (θ).

(To see that Vθ [log Xi] = ψ′ (θ), start by deriving the MFG of log Xi .) The asymptotic variance of the MLE is therefore 1/I0 (θ) = 1/ψ′ (θ).

The asymptotic relative efﬁciency of θˆMLE to θˆMOM is therefore

ARE(θ; θˆMLE , θˆMOM ) = θψ′ (θ).

This is plotted for a range of values as follows.

asymptotic relative efficiency

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

This suggests that the method of moments estimator is especially inefﬁcient for small θ ,

but becomes increasingly more efﬁcient as θ becomes larger.

Example: Binary sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample of binary data (i.e., zeros and ones) such that θ is the probability of observing the value

one on any given trial. The common mean and variance of each Xi is Eθ [Xi] = θ and Vθ [Xi] = θ(1 . θ), and the common PMF is fX,θ (x) = θx (1 . θ)1-x .

In a previous example, the relevant Wald statistic for a test of H0 : θ = θ0 versus H1 : θ θ0 was deduced as

Wˆ (X) = /nI0 (θˆ)(θˆ . θ0 ) = . θ0

V十ar θ [] = = (1 . )/n.

The corresponding pivot is Q(θ; X) = ln( . θ0 )/↓(1 . ), whose asymptotic distri- bution is standard Gaussian. By pivoting around around Q(θ; X), one sees that a subset estimator with approximate conﬁdence coefﬁcient 1 . α consists of all θ falling between the values

· z1-α/2/(1 . )/n.

An alternative statistic was deduced as

W (X) = l′ (θ0 ; X) = . θ0

/nI0 (θˆ) ↓θ0 (1 . θ0 )/n ,

whose asymptotic distribution under H0 is also standard Gaussian. The corresponding decision rule for an approximate size-α test to reject H0 when |W (X)| > z1-α/2. An alternative subset estimator may be derived by inverting the hypothesis test based on W (X). This produces

C(X) = ,θ0 : | | < z1-α/2 { ,

which is alternatively characterized as

C(X) = ,θ : ;1 + z 1(2)-α/2、θ 2 . ;2 + z 1(2)-α/2、θ + < 0、.

By the quadratic formula, the interval consists of the θ between the values

二 |(1 . ) + z 1(2)-α/2,} n

· z1-α/2 1 + 1 z2 .

Example: Negative binomial sampling. Suppose X = (X1 , . . . , Xn ) is an IID random sample such that the common distribution of each Xi is negative binomial with parameters r and θ. Suppose further than n is not necessarily large, but, instead, θ is small. That is, we are thinking of asymptotic analysis as θ 二 0.

For insight into this setup, consider two properties of the negative binomial distributions that are quickly derived from its MGF, which for each Xi in this example is

MX (t) = , 、r .