Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH3723 Statistical Theory

Session 2021–2022

Background. The Cauchy distribution is deﬁned by its probability density function

f(x) =

 1 ⇡(1 + x2 ) ,

x 2 R.

Here, the normalisation 1/⇡ ensures that R f(x)dx = 1.

Indeed, recalling that (arctanx)0 = , we have

Z1(1) = 2 Z0 1 = 2 lKim!1 Z0 K = 2 lKim!1arctanK = 2 · = ⇡ .

Note that the expected value of the Cauchy distribution is inﬁnite:

Z1(1) dx = 2 lKim!1 Z0 K dx = lKim!1ln(1 + x2 ) #0(K)  = lKim!1ln(1 + K2 ) = 1.

However, the median M is well deﬁned and, by symmetry, is given by M = 0, because e.g.

Z0 1 f(x)dx = Z0 1 = .

Statistical model.  Suppose we have an i.i.d. sample X = (X1 , . . . ,Xn ) from the shifted Cauchy distribution with density f(x|✓) = f(x − ✓) (✓ 2 R), that is,

f(x|✓) =

 1 ⇡ \$1+(x − ✓)2 % ,

x 2 R.  We  are  interested  in  estimating  the  location parameter ✓ .   Arranging  the  observed  values X1 ,X2 , . . . ,Xn in ascending order, consider the order statistics

X(1)  < X(2) < ··· < X(n) .

(Since the Cauchy distribution is continuous, with probability 1 there are no ties in the obser- vations, so all the inequalities may be presumed to be strict .)

The sample median Mn is deﬁned as the value separating the order statistics into two equal parts”; more precisely, if n = 2k +1 then Mn := X(k+1), so that

X(1)  < ··· < X(k) < Mn = X(k+1) < X(k+2) < ··· < X(2k+1) .

If n = 2k then the usual convention is to set Mn :=  X(k) + X(k+1) .

In this practical, you will consider the following four estimators of the parameter ✓:

ˆ := Xn =  Xi  (sample mean)

ˆ := Mn  (sample median)

ˆ := Mn + · #= Mn     (modiﬁed sample median)

ˆ := maximum likelihood estimator (MLE)

The MLE ˆ is the maximiser of the log-likelihood, i .e .

` ˆ |X =✓2R(max) `(|X), n

`(✓ |X) = lnf(X|✓) = −nln⇡ −X ln  1+(Xi )2 .

i=1

The coeﬃcient  2/n in the estimator ˆ (modiﬁed sample  median)  is explained  by Theorem 8 .3  in the  lecture  notes  (see formula  (8 .23)) and  by the fact that  Fisher’s  information  in this model is given by (see Appendix 1 below)

In () = E(`✓(0)(|X))2 = .

Derivation of this formula is provided in the handout attached with the practical task .

Task.  The objective of the  practical  is to explore and compare the asymptotic  properties of the four estimators above (where analytical calculations may not be possible) .

To  this  end,  use  computer  simulations  to  verify  if these  estimators  are  consistent (i .e . , ˆn approaches the true value ✓ as n ! 1) and  also to assess their accuracy  by evaluating, for di↵erent values of the sample size n, their mean square error MSE( ˆn) = E(ˆn − ✓)2 and the coverage probability P (|ˆn − ✓| "), say for " = 0.1. More specic guidelines:

1.  Statistical software R should be used for simulations and statistical computations including graphics .  See http://www.maths.leeds.ac.uk/school/students/computing.html for some useful documentation on  R .  More  help  is available within  R itself by clicking on  “Help” in the  menu and choosing  “R functions” or  “Html  help” .   In  particular, you  might nd it useful to familiarise yourself with the entries  Cauchy, plot, function, mean, median and  sum, and to practice with these commands before starting your practical work .

2.  Fix a certain value of ✓ and simulate your Cauchy samples using command  rcauchy.

3.  To visually  verify  consistency  of estimator ˆn = ˆ(Xn),  one  method  is  to  sample  the values X1 ,X2 , . . . ,Xn sequentially and to  plot the sequence of resulting values ˆn as a (random)  function  of n 2  N.   What  behaviour  of such  a  plot  would  you  expect  for  a consistent estimator?

4.  When assessing the quality of the estimators, the sequential approach may be too compu- tationally demanding, so it is  recommended to conﬁne oneself with some  representative (increasing) values of the sample size n, say n = 10, 100, 500, 1000, . . .

5.  To evaluate numerically the mean square error and the coverage probability (for a given n), make use of the Monte Carlo method (based on the Law of Large Numbers), according to which the expected value  E(Y) of a  random variable Y can  be approximated  by the sample mean Ym = (Y1 + ··· + Ym)/m of a (large) i .i .d . sample Y = (Y1 , . . . ,Ym) from

this distribution: E(Y) Y1 + ··· + Ym

In practical applications, the number of replicates m should be large enough so that any two di↵erent estimation runs would yield reasonably close approximate values .

6.  To  illustrate the  maximum  likelihood estimation  (MLE)  method,  plot the  log-likelihood function `(✓ |X) as a function of parameter ✓ for several di↵erent values of n (and with the corresponding sample values X1 , . . . ,Xn ﬁxed) .  This can be done using the R function plot, but rst you will have to deﬁne the function `(✓ |X) using suitable R commands . Is there always a unique root of the likelihood equation `✓(0) = 0?

7.  To calculate  MLE  numerically,  it  is  recommended to  use the  R command  optim;  note however that it minimises a given function .

8.  Summarise your ndings by drawing the conclusions and recommendations . Appendix 1: Fisher’s information

Suppose X = (X1 , . . . ,Xn) is an i.i.d. sample from the shifted Cauchy distribution with density

f(x|) = ,    x 2 R.                    (1)

The likelihood is given by L(|X) = f(X |) = f(Xi |) =   .

Hence, the log-likelihood and its partial derivative are easily calculated:

n

`(✓ |X) = lnL(✓ |X) = −nln⇡ −X ln  1+(Xi )2 i=1  `✓(0)(|X) = 2X(n) Xi

Our aim is to calculate Fisher’s information in this model by showing that In (✓) =

The fact that the information does not depend on the parameter ✓ should not be surprising in view of the shift nature of the distributional family (1).

By deﬁnition,

In () = E (` )✓(0) 2 .

However, it is more convenient to use the information’s additivity, In (✓) = nI(✓) (see the lecture notes, Section 4.3, Theorem 4.4), where I(✓) is the information per observation,

I() = E(` (|X✓(0)i))2 = Z1(1) 2 .

By a simple shift of the variable, x 7! x − ✓, this integral is rewritten as

I() = E(`✓(0)(|Xi))2 = Z1(1) dx.

Thus, we must evaluate the last integral. Namely, we have the following

Proposition.

 Z−1(1) dx = (3)    Proof.  Integrate in (3) by parts:

✓         ◆ 1 x2 1 1 1

−1 (1 + x2 )3                     4−1 1 + x2 )2

=   + 1(1)  |                                            }

4 −1 (1 + x2 )2 . Furthermore, decomposing and again integrating by parts: =                                 dx

 =  = = = 1 −1 + −  ⇡ − dx 1 1 x2 Z1+ x2 ✓4 −1 1 + x2 )2 1 1 xd 1 dx −1 1+ x2 8 ,

and (3) is proved.

Alternatively, using the substitution x = tant ( − < t < ) and noting that dx = dt 1+ x2  = 1

we obtain −1 (1 + x2 )3 dx = ⇡/2  1/ cos6 t · cos2 t

⇡/2

= 2          sin2 t cos2 t dt    [2costsint = sin2t]

Z0

1 ⇡/2

=                sin2 (2t)dt     [change of variables u = 2t]

2 Z0

1 = 4       sin2 u du    [2sin2 u = 1 − cos2u]

1

=           (1 cos2u)du

8   0 1

8                 8 ,

as required.

Appendix 2: Various hints on using R

##### Simulations:

> n = 10  # sample size

> theta = 2 # shift parameter

> X = rnorm(n, mean=theta, sd=1)  # generating a random (normal) sample

# for Cauchy distribution, use rcauchy(n, location, scale)

> hist(X)  # plotting the histogram of the data

> theta1 = mean(X)    # estimation by the sample mean

> theta2 = median(X)  # estimation by the median

##### How to plot the estimates dynamically? (consistency)

# This is simple for the mean:

> M1 = cumsum(X)/seq(1:n)   # computes the sample mean of subsamples

# (X, . . .X[k]) for k=1, . . .,n

> plot(M1, xlab="k", ylab="Estimator I", type="line")  # plot of M1

> abline(h=theta)  # to compare with the true value

##### How to do the same for the median?

> Median = function(X) {z=vector(length=n);

for(k in 1:n) z[k]=median(X[1:k]); z}

> M2=Median(X)

> plot(M2, xlab="k", ylab="Estimator II", type="line")

> abline(h=theta)

##### How to estimate the MSE?

> m = 1000 # number of replicates

> Theta1 = vector(length=m) # allocating space for iterations

> for(i in 1:m) {X = rnorm(n, mean=theta, sd=1); Theta1[i]=mean(X); Theta1}

# simulating m samples and computing the corresponding estimates of theta

> hist(Theta1) # plotting the histigram of Theta1

> MSE1 = mean((Theta1-theta)^2) # estimating MSE of the estimate theta1

> MSE1 # print MSE1

##### How to define the log-likelihood (for a given sample X) and plot it?

> logL = function(t) {ell=0; for (j in 1:n) ell=ell-(X[j]-t)^2; ell}

> plot(logL, xlab="theta", ylab="log-likelihood", from=-20, to=20, type="line")

##### How to find the maximum of a function?

> theta = 2

> n = 100

> X = rnorm(n, mean=theta, sd=1)

> f = function(theta) sum((X-theta)^2)

> a = optim(0, f)   # 0 = initial value for search (of min f)

> a\$par   # point of minimum

> a\$value # minimum value

##### NOTES

# 1 . ’optim’ searches for the MINIMUM value of the function .

# 2 . The exact answer in the example above: a = mean(X) .

# 3 . There will be a warning message that the default method is unreliable;

#    indeed, accuracy may not be great .

# 4 . Experiment further by specifying the method, e .g ."Brent", to see the

#    difference; but you will have to specify the search range, e .g .

#    > a = optim(0, f, method="Brent", lower=-10, upper=10)

# 5 . Alternatively, you can use the function optimize’ instead of optim’;

#    this also needs specification of the range .