STAT 7100: Introduction to Advanced Statistical Inference Examples and Illustrations for part A of learning unit 6
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 7100: Introduction to Advanced Statistical Inference
Examples and Illustrations for part A of learning unit 6
Example: Gaussian sampling. Suppose x = (X1 , . . . , Xn ) is an IID random sample of univariate Gaussian data, each Xi ~ G(µ, σ2 ), and the parameter is 9 = (µ, σ2 ).
For any value µ0 , consider the hypotheses H0 : 9 e Θ0 (u) and H1 : 9 e Θ 1 (u) defined by Θ0 (u) = {(µ, σ2 ) : µ = u; σ 2 > 0}
Θ 1 (u) = {(µ, σ2 ) : µ u; σ 2 > 0}
(That is H0 : µ = u versus H0 : µ u in the case where σ 2 is unknown.) Previous deductions identify that the decision rule of the size-α maximum-likelihood ratio test in this context is equivalent to the rule to reject H0 precisely when |Wu (m)| > tn-1,1-α/2 , where Wu (x) = ln( _ u)/S and tn-1,1-α/2 = F(1 _ α/2) is the (1 _ α/2)’th quantile of a t distribution with n _ 1 degrees of freedom. By inverting this test, one finds a subset estimator for g(9) = µ given by
C(x) = {u : |Wu (x)| < tn-1,1-α/2}
= [ _ tn-1,1-α/2S/ln, + tn-1,1-α/2S/ln].
The test above is defined from the property that the statistic Q(9; x) = ln( _ µ)/S follows a t distribution with n _ 1 degrees of freedom. The distribution of this statistic does not depend on 9, hence it defines a pivot. It furthermore depends on 9 only through g(9) = µ. Let t1 and t2 be any constants such that
P9 [_t1 < Q(9; x) < t2] = FT,n-1 (t2 ) _ FT,n-1 (_t1 ) = 1 _ α,
where FT,n-1 (t) is the CDF of a t distribution with n _ 1 degrees of freedom. By pivoting on Q(9; x), a subset estimator for g(9) = µ, with confidence coefficient 1 _ α, is defined according to
C(x) = {g(9) : _ t1 < Q(9; x) < t2 } = [ _ t1 S/ln, + t2 S/ln]. One possible setting has t1 = t2 = tn-1,α/2, in which case the subset estimator is equiva- lent to the one found by inverting the maximum-likelihood ratio test.
Notice the following as well:
• If t1 = tn-1,1-α = F(1 _ α) is the (1 _ α)’th quantile of a t distribution with n _ 1 degrees of freedom, then P9 [_t1 < Q(9; x)] = 1 _ α, and the corresponding subset estimator is
C(x) = {g(9) : _ t1 < Q(9; x)} = [ _ tn-1,1-α S/ln, o),
a half-interval estimator that defines a credible lower bound, L(x) = _tn-1,1-α S/ln, on µ .
• If t2 = tn-1,1-α , then P9 [Q(9; x) < t2] = 1 _ α, and the corresponding subset estimator is
C(x) = {g(9) : Q(9; x) < t2 } = (_o, + tn-1,1-α S/ln],
a half-interval estimator that defines a credible upper bound, U(x) = +tn-1,1-α S/ln, on µ .
Example: Ratio estimation. In response surface analysis, a common problem is to estimate the point at which a polynomial function is minimized. Define the function m(u) = θ 1 _ 2θ2u + θ3u2 , and suppose its coefficients are unknown, but are to be inferred through a sequence of experiments. Let us collect them into the multivariate parameter 9 = (θ1 , θ2 , θ3 ).
The quantity u indexes some characteristic of experimental conditions that is under the control of the experimenter. By varying u across a sequence of experiments, and applying data-analysis techniques from linear-models analysis to the resulting data, it is possible to summarize the data, x, into a multivariate statistic 9Y(x) = (θˆ1 (x), θˆ2 (x), θˆ3 (x)) that is presumed to follow a Gaussian distribution, 9Y(x) ~ G(9 , 2).
Observe that the first two derivatives of m(u) are m/ (u) = _2θ2 + 2θ3u and m// (u) = 2θ3u, hence if θ3 > 0 the function m(u) is minimized at u* = θ2 /θ3 . This ratio, g(9) = θ2 /θ3 , is the target of inference.
A subset estimator for θ2 /θ3 was proposed in the 1950’s by E.C. Fieller, which is derived by inverting a hypothesis test. Consider the hypotheses H0 : 9 e Θ0 (u) and H1 : 9 e Θ 1 (u) defined by
Θ0 (u) = {9 e o : g(9) = u}
Θ 1 (u) = {9 e o : g(9) u}.
The parameter-space subsets Θ0 (u) and Θ 1 (u) are easily re-characterized as
Θ0 (u) = {9 e o : θ2 _ θ3u = 0}
Θ 1 (u) = {9 e o : θ2 _ θ3u 0},
whose defining conditions parallel the linear transformation of the statistic 9Y(x) to θˆ2 (x)_ θˆ3 (x)u ~ G(θ2 _ θ3u, σ22 _ 2σ23u + σ33u2 ), having written σij for the (i, j)-entry of 2. Such observations motivate the use of
Wu (x) = θˆ2 (x) _ θˆ3 (x)u
as a test statistic. A size-α version of this test is defined by the rule to reject H0 precisely when |Wu (m)| > z1-α/2 = Φ -1 (1 _ α/2).
Inverting this test, the corresponding subset estimator collects the u such that H0 is not rejected,
C(x) = {u : |Wu (m)| < z1-α/2} = {u : (u; x) < 0},
where
(u; x) = {θˆ2 (x) _ θˆ3 (x)u}2 _ z 1(2)-α/2(σ22 _ 2σ23u + σ33u2 ), which is a second-order polynomial in u.
Observe that (u; x) < 0 at least for u = θˆ2 (x)/θˆ3 (x), which means that C(x) is never empty. However, a graph of (u; x) against u can take any of the shapes shown below.
|
|
(a)
(b)
(c)
Observe the following:
• The case (a) indicates that the subset estimator is an interval, C(x) = [L(x), U(x)].
• The case (b) indicates that the subset estimator is the entire real line, C(x) = (_o, o).
• The case (c) indicates that the subset estimator is a union of disjoint half-intervals, C(x) = (_o, U1 (x)] u [L2 (x), o).
Cases (b) and (c) occur only when the coefficient of the quadratic term of (u; x) is non- positive, θˆ3 (x)2 _ z 1(2)-α/2σ33 < 0. One should notice that this condition is equivalent to |W* (x)| < z1-α/2, where W* (x) = θˆ3 (x)/lσ33 , which is a familiar test statistic for a test of H0 : θ3 = 0 versus H0 : θ3 0 ; in the size-α version of this test, one accepts H0 when |W (x)* | < z1-α/2. Yet, accepting H0 : θ3 = 0 suggests the ratio g(9) = θ2 /θ3 is a rather unstable quantity.
Example: Gamma sampling. Suppose x = (X1 , . . . , Xn ) is an IID random sample from a gamma distribution, in which each Xi ~ gamma(ν/2, θ). Let us treat the shape parameter, ν/2, as a fixed, known quantity, and the scale parameter, θ, as the parameter of interest. From previous deductions, we know that the sample mean also follows a gamma
distribution, ~ gamma (nν/2, θ/n). A subset estimator with confidence coefficient 1 _ α is therefore
C(x) = {θ > 0 : F-(α1 ) < < F-(1 _ α2 )},
where F-(p) is the inverse-CDF of and α 1 and α2 are values such that α 1 + α2 = α .
The inverse-CDF, F-(p) does not, in general, have a closed form. However, it is possible to solve for θ in the above formula in the following way. Since the gamma distributions are a scale family, the distribution of ~ gamma (nν/2, θ/n) may be regarded as a scale- transformation of a chi-square distribution, = θV/(2n), where V ~ gamma(nν/2, 2) = χn(2)ν . It follows that F-(p) = θχn(2)ν,p /(2n), where χn(2)ν,p denotes the p’th quantile of a chi- square random variable with nν degrees of freedom. Many statistical computer programs, and some calculators, include functionality for working with chi-square random variables.
The subset estimator, above, is now
C(x) = {θ > 0 : θχn(2)ν,α ←/(2n) < < θχn(2)ν,1-α尸 /(2n)} = – 2n/χn(2)ν,1-α尸 , 2n/χn(2)ν,α ←! .
Example: Poisson sampling. Suppose x = (X1 , . . . , Xn ) is an IID random sample of Poisson count data such that each Xi ~ Poisson(θ). Define the statistic T (x) = Xi , and recall that this statistic satisfies T (x) ~ Poisson(nθ). Recall the connection between the Poisson and Erlang distributions, by which
Pθ [T (x) > t] = P [Y1 < n],
where Y1 ~ gamma(t, 1/θ), assuming t is an integer value. Similarly,
Pθ [T (x) < t] = 1 _ P [Y2 < n]
where Y2 ~ gamma(t+1, 1/θ). Recall once again that the gamma distributions are a scale family, so Y1 ~ gamma(t, 1/θ) is Y1 = V1 /(2θ) where V1 ~ χ2(2)t , and Y2 ~ gamma(t + 1, 1/θ)
is Y2 = V2 /(2θ), where V2 ~ χt+1). Combining these observations, one sees that Pθ [T (x) > t] = P [Y1 < n] = P [V1 < 2θn]
Pθ [T (x) < t] = 1 _ P [Y2 < n] = 1 _ P [V2 < 2θn]
The latter formula shows that the CDF of T (x), FT,θ (t) = Pθ [T (x) < t], is stochastically monotone.
Write χp(ν) to denote the p’th quantile of a chi-square distribution with ν degrees of free- dom. By the above, the function θL (t) = χ2(2)t,α ←/(2n) satisfies Pθ ←(t)[T (x) > t] = α 1 and θU (t) = χt+1),1-α尸 /(2n) satisfies Pθ r(t)[T (x) < t] = α2 An interval estimator, C(x) =
[L(x), U (x)], with confidence coefficient 1 _ (α1 + α2 ), is therefore defined by the lower and upper bounds
L(x) = χ2(2)T (x),α ← and U (x) = χ2(2){T (x)+1},1-α尸 .
Example: Sampling in an exponential location family. Suppose x = (X1 , . . . , Xn ) is an IID random sample whose common distribution is the location family generated from the exponential distribution with parameter β = 1. The PDF is
fX,θ (x) =
where θ is the location parameter.
One suitable pivot is Q1 (θ; x) = _ θ, whose distribution is Q1 (θ; x) ~ gamma(n, 1/n). This is the distribution of the sample mean, , of an IID sample Z = (Z1 , . . . , Zn ) with each Zi ~ exponential(1). Let a1 and b1 be such that P [ < a1] = α/2 and P [ > b1] = α/2, so that P [a1 < < b1] = 1 _ α. An interval estimator for θ, with confidence coefficient 1 _ α , is
C2 (x) = {θ : a1 < _ θ < b1 } = [ _ b1 , _ a1].
Another pivot is Q(θ; x) = X(1) _ θ, where X(1) = min x. Its distribution is Q(θ; x) ~ gamma(1, 1/n), which is that of the first order statistic, Z(1) , of an IID sample Z = (Z1 , . . . , Zn ) with each Zi ~ exponential(1). Let a2 and b2 be such that P [Z(1) < a2] = α/2 and P [Z(1) > b2] = α/2, so that P [a2 < Z(1) < b2] = 1 _ α. An interval estimator for θ , with confidence coefficient 1 _ α, is
C2 (x) = {θ : a2 < X(1) _ θ < b2 } = [X(1) _ b2 , X(1) _ a2].
Properties of the gamma distributions imply that Eθ [] = θ + 1, a1 = χ2(2)n,α/2/(2n) and b1 = χ2(2)n,1-α/2/(2n), writing χp(ν) to denote the p’th quantile of a chi-square distribution with ν degrees of freedom. Similarly, Eθ [X(1)] = θ + 1/n, a2 = χ2(2),α/2/(2n) and b2 = χ2(2) ,1-α/2/(2n).
Thus, if we define the “expectation” of an interval estimator C(x) = [L(x), U (x)] as Eθ [C(x)] = [Eθ [L(x)], Eθ [U (x)]], then the expectations of the intervals above are
Eθ [C1 (x)] = [θ + 1 _ χ2(2)n,1-α/2/(2n), θ + 1 _ χ2(2)n,α/2/(2n)]
Eθ [C2 (x)] = [θ + 1/n _ χ2(2) ,1-α/2/(2n), θ + 1/n _ χ2(2),α/2/(2n)]. For instance, if n = 10 and α = 0.05, the expectations are
Eθ [C1 (x)] = [θ _ 0.71, θ + 0.52] and Eθ [C2 (x)] = [θ _ 0.26, θ + 0.10].
Both intervals have the same coverage probability, but the interval C2 (x) is much nar- rower than C1 (x), and would be preferred. This is unsurprising since C2 (x) pivots around a minimal sufficient statistic.
Example: Double-exponential sampling. Suppose x = (X1 , . . . , Xn ) is an IID random sample with each Xi ~ double-exponential with parameters µ and β, and suppose the parameter is 9 = (µ, β).
A pivot for constructing subset estimators for µ is Q(9; x) = (M _ µ)/MAD, where M and MAD are the median and mean absolute deviation from the median of x. Recall that M and MAD form the MLE for 9 = (µ, β). To see that its distribution does not depend on 9 , the pivot may be rewritten Q(9; x) = MZ /MADZ , where MZ and MADZ are the median and mean absolute deviation from the median of an IID sample Z = (Z1 , . . . , Zn ) with each Zi ~ double-exponential(0, 1). Recall also that double-exponential is a location- scale family, so each Xi = µ + βZi .
Let a and b be such that P [Q(9; x) < a] = α/2 and P [Q(9; x) > b] = α/2, so that P [a < Q(9; x) < b] = 1 _ α. An interval estimator for µ, with confidence coefficient 1 _ α , is
C(x) = {θ : a < (M _ µ)/MAD < b}
= [M _ bMAD, M _ aMAD].
The distribution of Q(9; x) is quite non-standard, which makes it somewhat challenging to calculate the constants a and b. One approach is to calculate them numerically, by simulation. Using computer software, a sample Z = (Z1 , . . . , Zn ) is repeatedly generated, but at each repetition the quantity MZ /MADZ is calculated. After many, many repetitions, a and b are found as the α/2’th and (1 _ α/2)’th quantiles of the simulated MZ /MADZ .
To simulate an individual Zi ~ double-exponential(0, 1), recall that if U ~ uniform(0, 1), then _ log U ~ exponential(1). Finally, Zi = V log U ~ double-exponential(0, 1), where V
is a discrete random variable with two possible values, -1 and 1, each assigned probability 1/2.
Example: Cauchy sampling. Suppose x = (X1 , . . . , Xn ) is an IID random sample of Cauchy distributed data with scale parameter σ = 1. That is, each Xi ~ Cauchy(θ, 1). Since we are working with a scale family, we might think to construct a pivot from the median, say, as Q(θ; x) = M _ θ, so that the subset estimator is
C(x) = {θ : _ c < M _ θ < c} = [M _ c, M + c],
where c is the (1 _ α/2)’th quantile of the distribution of MZ , the median of an IID sample Z = (Z1 , . . . , Zn ) of standard Cauchy random variables.
However, this solution is not entirely satisfying since the median statistic, M is not a sufficient statistic for θ. We know this because the order statistics s(x) = (X(1) , . . . , X(n)) is a minimal sufficient statistic for θ, and these statistics cannot be calculated from M alone.
Another issue is that the order statistics contain an ancillary component, which can be seen by translating
s(x) = (X(1) , . . . , X(n)) to T (x) = (T1 (x), C2 (x), . . . , Cn (x)),
where T1 (x) = X(1) , and each entry of C(x) = (C2 (x), . . . , Cn (x)) is defined by Ci (x) = X(i) _ X(1) . The statistic C(x) is sometimes called the “configuration” statistic, and it is ancillary for θ. (Can you see why? To do so, recall that we are working with a location family.)
A suitable subset estimator for θ arises from the following line of deduction.
• The the joint PDF of the order statistics, s(x), is
fs,θ (x(1) , . . . , x(n)) = n!fX,θ (x(1) ) . . . fX,θ (x(n))
where fX,θ (x) = π -1 {1+(x _ θ)2 }-1 is the common PDF of each Xi . This is deduced by first noting that that CDF of s(x) is
Fs,θ(x(1) , . . . , x(n)) = Pθ [X(1) < x(1) , . . . , X(n) < x(n)]
= Pθ [Xr ← < x(1) , . . . , Xrh < x(n)]
(r ←,...,rh)eRh
= n!Fx,θ (x(1) , . . . , x(n)),
where Rn is the set of permutations of 1, . . . , n, of which there are n!, and Fx,θ(m) is the CDF of x . Taking derivatives yields the PDF above.
• The joint PDF of T (x) is
fT ,θ (t1 , c2 , . . . , cn ) = n!fX,θ (t1 )fX,θ (t1 + c2 ) . . . fX,θ (t1 + cn ).
This is easily deduced from the transformation formula, upon noting that the Jaco- bian determinant of the transformation from s(x) to T (x) is one.
• The joint PDF of T (x) is alternatively written
fT ,θ(t1 , c2 , . . . , cn ) = n!fZ (t1 _ θ)fZ (t1 + c2 _ θ) . . . fZ (t1 + cn _ θ), where fZ (z) = π -1 (1 + z2 )-1 is the PDF of a standard Cauchy distribution. This is because we are working with a location family.
• The conditional distribution of T1 (x) = X(1) , given C(x) = (c2 , . . . , cn ) is
fT |c,θ(t1 |c2 , . . . , cn ) =
Underlying these steps is the suggestion to invoke the Conditionality Principle, and decide that the conditional distribution of T1 (x) given C(x) = c is the correct distribution on which to base the construction of a subset estimator. However, once that decision is made, the actual formulation of the subset estimator is simple because fT |c,θ(t1 |c) is a univariate PDF in the form of a location family.
For example, a pivot for constructing subset estimators for θ is Q(9; x) = T1 (x) _ θ. A suitable subset estimator is
C(x) = {θ : a < T1 (x) _ θ < b} = [T1 (x) _ b, T1 (x) _ a], for constants a and b such that
Pθ [a < Q(9; x) < b|C(x) = c] = 1 _ α.
The confidence coefficient of this interval is 1 _ α, provided the concept of “confidence coefficient” is understood conditionally, given C(x) = c.
2022-04-18