关键词 > STAT3602/STAT6008
STAT3602 Statistical Inference / STAT6008 Advanced Statistical Inference Mini-project
发布时间:2022-12-12
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE
STAT3602 Statistical Inference / STAT6008 Advanced Statistical Inference Mini-project
(assessment weighting: 15%)
Due date: 16 December, 2022
Complete Q1–Q4 with the help of a convenient computer software of your choice.
Preliminary work
Let (X1 , . . . , Xn ) be an i.i.d. sample of observations drawn from a discrete distribution on {1, . . . , J} such that, for each i = 1, . . . , n,
P (Xi = j) = pj , j = 1, . . . , J,
for some unknown probabilities p1 , . . . , pJ satisfying j(J)=1 pj = 1. Define, for j = 1, . . . , J,
n
Nj = 1{Xi = j} = number of observations in the sample which are equal to j .
i=1
You may find the following fact useful for completing this project:
Let α 1 , . . . , αr 2 0 and B > 0 be given constants. Subject to the constraints β1 , . . . , βr 2 0 and i(r)=1 βi = B, the product i(r)=1 βi(α)i is maximised by setting βi = Bαi / j(r)=1 αj , for i = 1, . . . , r .
Q1. (2.5%)
Let a1 , . . . , aK (K < J) be fixed positive constants. Suppose that the probabilities pj ’s are assumed to satisfy the constraint
p1 /a1 = . . . = pK /aK .
Show that subject to the above constraint, the likelihood function pXi is maximised by setting
,
. .
pj = .
. .
(
aj
Ni
n ai , Nj /n,
j = 1, . . . , K,
j = K + 1, . . . , J.
Hint: Let p1 /a1 = . . . = pK /aK = c. Show that the problem reduces to maximisation of c N〉╱ with respect to pK+1 , . . . , pJ , c > 0, subject to the constraint
K+1pi = 1 _ c
ai .
j(J)=K+1pj(N)〉、
Q2. (2.5%)
Let a1 , . . . , aK (K < J) be fixed positive constants satisfying
probabilities pj ’s are assumed to satisfy the constraints
ai < 1. Suppose that the
pi 2 ai for i = 1, . . . , K.
Show that subject to the above constraints, the likelihood function pXi is maximised by
setting
,.max , , aj (, j = 1, . . . , K,
.(.) J , j = K + 1, . . . , J,
. i=K+1 Ni
where Z solves the equation
K
j=1 max , , aj ( + Z _ 1 = 0.
Hint: First show that with (p1 , . . . , pK ) fixed, the likelihood is maximised by setting
K
pj = ╱ 1 _ i=1 pi、, j = K + 1, . . . , J.
Next, proceed to maximise
K J K
G(p1 , . . . , pK ) 全 Nj ln pj + ╱ Nj、ln ╱ 1 _ pj、
j=1 j =K+1 j=1
w.r.t. p1 , . . . , pK , using partial differentiation, taking into consideration the constraints pj 2 aj for
j = 1, . . . , K .
Real data problem
The Word file text.docx contains two texts:
❼ (in Chinese, p.1–5) an extract from “倚天屠龍記” (The Heaven Sword and Dragon Saber),
written by 金庸(Jin Yong) in 1961, consisting of 8046 characters, with all punctuations removed;
❼ (in English, p.6– 18) an extract from “The Daughter of Time”, written by Josephine Tey in
1951, consisting of 9413 words.
For this project, you may choose to work on either the Chinese text or the English text.
For simplicity, in what follows the term “word” refers to either a Chinese character or an English word. The following tables list the top ten most commonly used Chinese and English words, together
with their usage rates:
(source: Chinese Character Frequency, https://humanum.arts.cuhk.edu.hk/Lexis/chifreq)
Word Usage rate (%) |
的 一 是 不 人 有 在 了 我 中 3.6800 1.6830 1.4020 1.3850 1.1490 1.1020 0.9324 0.7592 0.7482 0.6201 |
(source: English Word Frequency , https://www .kaggle .com/datasets/rtatman/english-word-frequency)
Word Usage rate (%) |
the of and to a in for is on that 3.9338 2.2363 2.2100 2.0637 1.5441 1.4401 1.0089 0.8001 0.6377 0.5781 |
Based on the sample text in text.docx, we wish to compare the author’s usage of the above 10 most commonly used words against the “benchmark” usage rates given in the above table. Word frequencies of a text can be easily extracted online from the websites
https://www.browserling.com/tools/letter-frequency (for characters) https://www.browserling.com/tools/word-frequency (for words) .
Assume that all the words in the sample text are independently and identically distributed over the entire vocabulary, which can be coded as a finite set {1, 2, . . . , J} (it is not necessary to specify J). Without loss of generality, we may take {1, 2, . . . , 10} to be the set of the 10 most commonly used words. For j = 1, . . . , J, denote by pj the probability of the author’s using the word j .
Q3. (5%)
Let a1 , . . . , a10 denote the benchmark usage rates of the 10 most commonly used words. We wish to test
H0 : p1 /a1 = . . . = p10 /a10 vs H1 : no restriction.
(a) Give a layman interpretation of the null hypothesis H0 .
(b) Using the results obtained in Q1, conduct a generalised likelihood ratio (GLR) test and
report a p-value. You may approximate the null distribution of the GLR test statistic by a chi-square distribution on an appropriate number of degrees of freedom.
Does the sample text show evidence against H0 ?
(c) Instead of the chi-square distribution, the bootstrap method may be used to provide an alternative approximation to the null distribution of the GLR test statistic. For this purpose the bootstrap samples must be drawn in a way which respects the null hypothesis H0 . Thus, each bootstrap sample should be generated by (weighted) sampling with replacement from the sample text, where word j should be drawn with an estimated probability pˆj , which can be taken to be the constrained maximum likelihood estimate of pj under H0 , j = 1, . . . , J.
Calculate the constrained maximum likelihood estimates pˆj ’s. Based on these estimates, apply the bootstrap method to estimate the null distribution of the GLR test statistic, using 10000 bootstrap samples. Conduct the bootstrap test and report a p-value.
Hint: The GLR test statistic depends only on the counts of the 10 most commonly used words {1, 2, . . . , 10} and the length of the text. Thus, it is not necessary to really generate a full bootstrap sample of the same length as the sample text, which is unnecessarily time-consuming. Try to exploit the relationship between the 10 word counts and a multinomial distribution to simplify your computing process.
(d) Plot the cumulative distribution functions (cdf) of the bootstrap distribution obtained in
(c) and the chi-square distribution used in (b) on the same diagram. How do the two cdf’s compare with each other?
Q4. (5%)
As in Q3, let a1 , . . . , a10 denote the benchmark usage rates of the 10 most commonly used words. We wish to test
H0 : p1 2 a1 , . . . , p10 2 a10 vs H1 : no restriction.
(a) Give a layman interpretation of the null hypothesis H0 .
(b) Using the results obtained in Q2, calculate the observed value of the GLR test statistic.
Hint: The function
K
f (Z) 全 j=1 max { , aj } + Z _ 1
is a piecewise linear increasing function in Z, with f (0) < 0 and f (Z) → o as Z → o. Thus, there exists a unique solution to the equation f (Z) = 0, which can be found numerically by any convenient equation solver.
(c) We do not expect the chi-square distribution to be a valid approximation to the null distribution of the GLR test statistic for this problem. Why?
(d) As in Q3(c), calculate the constrained maximum likelihood estimates pˆj ’s under H0 . Based on these estimates, apply the bootstrap method to estimate the null distribution of the GLR test statistic, using 10000 bootstrap samples. Plot the cdf of the bootstrap distribution. Conduct the bootstrap test and report a p-value.
Does the sample text show evidence against H0 ?
* Points to note *
❼ In the main text of your report, show and explain your steps, and display formulae in their conventional mathematical form. Do not explain anything using computer code.
❼ Attach your computer code to your report as an appendix. Include brief comments on lines which involve complicated operations.