关键词 > STAT2001/5008

Mathematical Statistics: STAT2001/5008

发布时间：2023-05-18

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Mathematical Statistics: STAT2001/5008

Project

March 24, 2023

1 Introduction

This document provides guidelines and instructions for the project component in this unit. It includes the expectation, deadline of the project and information on group formation.

1.1 Group formation

This is a group project, with each group consisting of two students. Only one submission is required per group. All group members must sign the assignment cover page available on Blackboard and attach it as the ﬁrst page to the submission. Projects without a cover page will not be marked.

1.2 Submissions

The project is worth 20% of your ﬁnal grade. Submit your response via the Turnitin tab on Blackboard by 27 May 2023 . Your submission must include an R code in appendix illustrating calculations performed in R.

2 Project

This section presents the objective, available resources and questions you need to answer

2.1 Objective

I have a list comprising of 235,880 English words. Words in my list have 1 to 24 characters. Words that have 8 or more characters are referred to as large words. For example, the word STATISTICS has 10 characters and is referred to as a large word.

The objective of this project is to estimate the proportion of large words in my list.

2.2 Data collection

You can select multiple samples with or without replacement from the list of words. The maximum sample size is 100 words for each sample. Samples can be drawn from my list of words available here. I am allowed only limited computation time on this site so please use it wisely.

The full URL is https://probabilisticaggregation.shinyapps.io/SamplingFromWordLIst/

An Example of a sample of size 10, without replacement, is presented in Table 1. In this sample, we have 8 large words.

Word	Number of Characters	Word Type
Streptocarpus	13	Large
amylodextrin	12	Large
nuncupation	11	Large
unmined	7
unseemingly	11	Large
pseudohypertrophy	17	Large
funnyman	8	Large
antimicrobic	12	Large
subboreal	9	Large
serial	6

Table 1: A sample of 10 words selected from the list without replacement

2.3 Questions

The response to the questions must be submitted as a typed document using any system of choice. All calculations must be performed in R and included in the main report’s appendix. The appendix should contain the code and the output, each clearly annotated.

1. Let P be the proportion of large words in the sample you propose to select. Specify two statistical distributions, fP (p), that may be used to describe P. Clearly state the name of the distribution, its known parameters and its unknown parameters θ . [4 Marks]

2. For each distribution speciﬁed above,

(a) State E(P), Var(P) and MP (t).

(b) Simulate (or otherwise) sketch the probability function of P for diﬀerent values of parameters theta. Comment on the shape of the distribution.

If for your choice of distributions moments/moment generating function do not exist, mark in your submission non-existence of the respective quantities. [4 Marks]

3. Outline the methodology for estimating the proportion of large words in the list using each distri- bution speciﬁed in part 1. You must specify how the data will be collected and how your estimate of the proportion of large words will be computed. Justify the choice of the methodology proposed. [6 Marks]

4. Now collect the data using the link provided in section 2.2 in accordance with the distributions and methodology proposed in the previous part and estimate the proportion of large words. Comment on the accuracy of each estimate. [6 Marks]

2.4 Reality Check

After the due date, the proportion of large words will be made available. There is a prize for the best estimate (based on accuracy and cost).