MATH3723 Statistical Theory Session 2021–2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH3723 Statistical Theory
Session 2021–2022
Background. The Cauchy distribution is defined by its probability density function
f(x) =
1 |
⇡(1 + x2 ) , |
x 2 R.
Here, the normalisation 1/⇡ ensures that R f(x)dx = 1.
Indeed, recalling that (arctanx)0 = , we have
Z−1(1) = 2 Z0 1
= 2 lKim!1 Z0 K
= 2 lKim!1arctanK = 2 ·
= ⇡ .
Note that the expected value of the Cauchy distribution is infinite:
Z−1(1) dx = 2 lKim!1 Z0 K
dx = lKim!1ln(1 + x2 ) #0(K) = lKim!1ln(1 + K2 ) = 1.
However, the median M is well defined and, by symmetry, is given by M = 0, because e.g.
Z0 1 f(x)dx = Z0 1
=
.
Statistical model. Suppose we have an i.i.d. sample X = (X1 , . . . ,Xn ) from the shifted Cauchy distribution with density f(x|✓) = f(x − ✓) (✓ 2 R), that is,
f(x|✓) =
1 |
⇡ $1+(x − ✓)2 % , |
x 2 R.
We are interested in estimating the location parameter ✓ . Arranging the observed values X1 ,X2 , . . . ,Xn in ascending order, consider the order statistics
X(1) < X(2) < ··· < X(n) .
(Since the Cauchy distribution is continuous, with probability 1 there are no ties in the obser- vations, so all the inequalities may be presumed to be strict .)
The sample median Mn is defined as the value separating the order statistics into two “equal parts”; more precisely, if n = 2k +1 then Mn := X(k+1), so that
X(1) < ··· < X(k) < Mn = X(k+1) < X(k+2) < ··· < X(2k+1) .
If n = 2k then the usual convention is to set Mn :=
X(k) + X(k+1)
.
In this practical, you will consider the following four estimators of the parameter ✓:
• ✓ˆ := Xn =
Xi (sample mean)
• ✓ˆ := Mn (sample median)
• ✓ˆ := Mn +
·
#✓ = Mn (modified sample median)
• ✓ˆ := maximum likelihood estimator (MLE)
The MLE ✓ˆ is the maximiser of the log-likelihood, i .e .
` ✓ˆ|X
=✓2R(max) `(✓ |X),
n
`(✓ |X) = lnf(X|✓) = −nln⇡ −X ln 1+(Xi − ✓)2 .
i=1
The coefficient 2/n in the estimator ✓ˆ (modified sample median) is explained by Theorem 8 .3 in the lecture notes (see formula (8 .23)) and by the fact that Fisher’s information in this model is given by (see Appendix 1 below)
In (✓) = E✓ ⇥(`✓(0)(✓ |X))2 ⇤ = .
Derivation of this formula is provided in the handout attached with the practical task .
Task. The objective of the practical is to explore and compare the asymptotic properties of the four estimators above (where analytical calculations may not be possible) .
To this end, use computer simulations to verify if these estimators are consistent (i .e . , ✓ˆn approaches the true value ✓ as n ! 1) and also to assess their accuracy by evaluating, for di↵erent values of the sample size n, their mean square error MSE✓ ( ✓ˆn) = E✓ (✓ˆn − ✓)2 and the coverage probability P✓ (|✓ˆn − ✓|
"), say for " = 0.1.
More specific guidelines:
1. Statistical software R should be used for simulations and statistical computations including graphics . See http://www.maths.leeds.ac.uk/school/students/computing.html for some useful documentation on R . More help is available within R itself by clicking on “Help” in the menu and choosing “R functions” or “Html help” . In particular, you might find it useful to familiarise yourself with the entries Cauchy, plot, function, mean, median and sum, and to practice with these commands before starting your practical work .
2. Fix a certain value of ✓ and simulate your Cauchy samples using command rcauchy.
3. To visually verify consistency of estimator ✓ˆn = ✓ˆ(Xn), one method is to sample the values X1 ,X2 , . . . ,Xn sequentially and to plot the sequence of resulting values ✓ˆn as a (random) function of n 2 N. What behaviour of such a plot would you expect for a consistent estimator?
4. When assessing the quality of the estimators, the sequential approach may be too compu- tationally demanding, so it is recommended to confine oneself with some representative (increasing) values of the sample size n, say n = 10, 100, 500, 1000, . . .
5. To evaluate numerically the mean square error and the coverage probability (for a given n), make use of the Monte Carlo method (based on the Law of Large Numbers), according to which the expected value E(Y) of a random variable Y can be approximated by the sample mean Ym = (Y1 + ··· + Ym)/m of a (large) i .i .d . sample Y = (Y1 , . . . ,Ym) from
this distribution:
E(Y) ⇡ Y1 + ··· + Ym
In practical applications, the number of replicates m should be large enough so that any two di↵erent estimation runs would yield reasonably close approximate values .
6. To illustrate the maximum likelihood estimation (MLE) method, plot the log-likelihood function `(✓ |X) as a function of parameter ✓ for several di↵erent values of n (and with the corresponding sample values X1 , . . . ,Xn fixed) . This can be done using the R function plot, but first you will have to define the function `(✓ |X) using suitable R commands . Is there always a unique root of the likelihood equation `✓(0) = 0?
7. To calculate MLE numerically, it is recommended to use the R command optim; note however that it minimises a given function .
8. Summarise your findings by drawing the conclusions and recommendations .
Appendix 1: Fisher’s information
Suppose X = (X1 , . . . ,Xn) is an i.i.d. sample from the shifted Cauchy distribution with density
f(x|✓) = , x 2 R. (1)
The likelihood is given by
L(✓ |X) = f(X |✓) =
f(Xi |✓) =
.
Hence, the log-likelihood and its partial derivative are easily calculated:
n
`(✓ |X) = lnL(✓ |X) = −nln⇡ −X ln 1+(Xi − ✓)2
i=1
`✓(0)(✓ |X) = 2X(n) Xi − ✓
Our aim is to calculate Fisher’s information in this model by showing that
|
The fact that the information does not depend on the parameter ✓ should not be surprising in view of the shift nature of the distributional family (1).
By definition,
In (✓) = E✓ ⇥(` )✓(0) 2 ⇤ .
However, it is more convenient to use the information’s additivity, In (✓) = nI(✓) (see the lecture notes, Section 4.3, Theorem 4.4), where I(✓) is the information per observation,
I(✓) = E✓ ⇥(` (✓ |X✓(0)i))2 ⇤= Z−1(1) ✓ ◆2
.
By a simple shift of the variable, x 7! x − ✓, this integral is rewritten as
I(✓) = E✓ ⇥(`✓(0)(✓ |Xi))2 ⇤ = Z−1(1)
dx.
Thus, we must evaluate the last integral. Namely, we have the following
Proposition.
Z−1(1) |
(3)
Proof. Integrate in (3) by parts:
✓ ◆
1 x2 1 1 1
−1 (1 + x2 )3 4✓ −1 1 + x2 )2
= −
+
−1(1)
| }
4 −1 (1 + x2 )2 .
Furthermore, decomposing and again integrating by parts:
= dx
=
= = |
1
−1
+
−
|
Z1+ x2 ✓4 −1 1 1
1 dx −1 1+ x2
|
and (3) is proved.
Alternatively, using the substitution x = tant ( − < t <
) and noting that
dx = dt 1+ x2 = 1
we obtain
−1 (1 + x2 )3 dx = −⇡/2 1/ cos6 t · cos2 t
⇡/2
= 2 sin2 t cos2 t dt [2costsint = sin2t]
Z0
1 ⇡/2
= sin2 (2t)dt [change of variables u = 2t]
2 Z0
1 ⇡
= 4 sin2 u du [2sin2 u = 1 − cos2u]
1 ⇡
= (1 − cos2u)du
8 0
1 ⇡
8 8 ,
as required.
Appendix 2: Various hints on using R
##### Simulations:
> n = 10 # sample size
> theta = 2 # shift parameter
> X = rnorm(n, mean=theta, sd=1) # generating a random (normal) sample
# for Cauchy distribution, use rcauchy(n, location, scale)
> hist(X) # plotting the histogram of the data
> theta1 = mean(X) # estimation by the sample mean
> theta2 = median(X) # estimation by the median
##### How to plot the estimates dynamically? (consistency)
# This is simple for the mean:
> M1 = cumsum(X)/seq(1:n) # computes the sample mean of subsamples
# (X[1], . . .X[k]) for k=1, . . .,n
> plot(M1, xlab="k", ylab="Estimator I", type="line") # plot of M1
> abline(h=theta) # to compare with the true value
##### How to do the same for the median?
> Median = function(X) {z=vector(length=n);
for(k in 1:n) z[k]=median(X[1:k]); z}
> M2=Median(X)
> plot(M2, xlab="k", ylab="Estimator II", type="line")
> abline(h=theta)
##### How to estimate the MSE?
> m = 1000 # number of replicates
> Theta1 = vector(length=m) # allocating space for iterations
> for(i in 1:m) {X = rnorm(n, mean=theta, sd=1); Theta1[i]=mean(X); Theta1}
# simulating m samples and computing the corresponding estimates of theta
> hist(Theta1) # plotting the histigram of Theta1
> MSE1 = mean((Theta1-theta)^2) # estimating MSE of the estimate theta1
> MSE1 # print MSE1
##### How to define the log-likelihood (for a given sample X) and plot it?
> logL = function(t) {ell=0; for (j in 1:n) ell=ell-(X[j]-t)^2; ell}
> plot(logL, xlab="theta", ylab="log-likelihood", from=-20, to=20, type="line")
##### How to find the maximum of a function?
> theta = 2
> n = 100
> X = rnorm(n, mean=theta, sd=1)
> f = function(theta) sum((X-theta)^2)
> a = optim(0, f) # 0 = initial value for search (of min f)
> a$par # point of minimum
> a$value # minimum value
##### NOTES
# 1 . ’optim’ searches for the MINIMUM value of the function .
# 2 . The exact answer in the example above: a = mean(X) .
# 3 . There will be a warning message that the default method is unreliable;
# indeed, accuracy may not be great .
# 4 . Experiment further by specifying the method, e .g ."Brent", to see the
# difference; but you will have to specify the search range, e .g .
# > a = optim(0, f, method="Brent", lower=-10, upper=10)
# 5 . Alternatively, you can use the function ’optimize’ instead of ’optim’;
# this also needs specification of the range .
2022-04-27