关键词 > STAT2001/5008
Mathematical Statistics: STAT2001/5008
发布时间:2023-05-18
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Mathematical Statistics: STAT2001/5008
Project
March 24, 2023
1 Introduction
This document provides guidelines and instructions for the project component in this unit. It includes the expectation, deadline of the project and information on group formation.
1.1 Group formation
This is a group project, with each group consisting of two students. Only one submission is required per group. All group members must sign the assignment cover page available on Blackboard and attach it as the first page to the submission. Projects without a cover page will not be marked.
1.2 Submissions
The project is worth 20% of your final grade. Submit your response via the Turnitin tab on Blackboard by 27 May 2023 . Your submission must include an R code in appendix illustrating calculations performed in R.
2 Project
This section presents the objective, available resources and questions you need to answer
2.1 Objective
I have a list comprising of 235,880 English words. Words in my list have 1 to 24 characters. Words that have 8 or more characters are referred to as large words. For example, the word STATISTICS has 10 characters and is referred to as a large word.
The objective of this project is to estimate the proportion of large words in my list.
2.2 Data collection
You can select multiple samples with or without replacement from the list of words. The maximum sample size is 100 words for each sample. Samples can be drawn from my list of words available here. I am allowed only limited computation time on this site so please use it wisely.
The full URL is https://probabilisticaggregation.shinyapps.io/SamplingFromWordLIst/
An Example of a sample of size 10, without replacement, is presented in Table 1. In this sample, we have 8 large words.
Word |
Number of Characters |
Word Type |
Streptocarpus |
13 |
Large |
amylodextrin |
12 |
Large |
nuncupation |
11 |
Large |
unmined |
7 |
|
unseemingly |
11 |
Large |
pseudohypertrophy |
17 |
Large |
funnyman |
8 |
Large |
antimicrobic |
12 |
Large |
subboreal |
9 |
Large |
serial |
6 |
|
Table 1: A sample of 10 words selected from the list without replacement
2.3 Questions
The response to the questions must be submitted as a typed document using any system of choice. All calculations must be performed in R and included in the main report’s appendix. The appendix should contain the code and the output, each clearly annotated.
1. Let P be the proportion of large words in the sample you propose to select. Specify two statistical distributions, fP (p), that may be used to describe P. Clearly state the name of the distribution, its known parameters and its unknown parameters θ . [4 Marks]
2. For each distribution specified above,
(a) State E(P), Var(P) and MP (t).
(b) Simulate (or otherwise) sketch the probability function of P for different values of parameters theta. Comment on the shape of the distribution.
If for your choice of distributions moments/moment generating function do not exist, mark in your submission non-existence of the respective quantities. [4 Marks]
3. Outline the methodology for estimating the proportion of large words in the list using each distri- bution specified in part 1. You must specify how the data will be collected and how your estimate of the proportion of large words will be computed. Justify the choice of the methodology proposed. [6 Marks]
4. Now collect the data using the link provided in section 2.2 in accordance with the distributions and methodology proposed in the previous part and estimate the proportion of large words. Comment on the accuracy of each estimate. [6 Marks]
2.4 Reality Check
After the due date, the proportion of large words will be made available. There is a prize for the best estimate (based on accuracy and cost).