闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT6086 Sampling Techniques

Assignment 2022-23

You must submit one electronic copy of your report in PDF by 11.59pm on Tuesday 10th January 2023. You must submit your electronic copy (a single file) via the STAT6093 Blackboard website using TurnItIn (in the Assignments folder). Your Student ID Number must appears in your electronic copy. Make sure that your assignment fits in a single PDF document. A scanned handwritten document is not allowed. Note that the file has to be smaller than 10 MB. If your Word file is larger than this, try converting it to a readable PDF or save your images (graphs, plots, …) in JPEG (instead of BMP).

It is the policy of the Department of Social Statistics that courseworks should be anonymous, therefore only your Student ID Number appears in your Word or PDF document. To maintain anonymity please do not put your name on any part of your submission. You must put your Student ID Number on the first page of your coursework.

Note that it is not acceptable that you read and gain ideas for your coursework from another student’s finished work. It is very important that you read carefully the Section “Academic Integrity and Referencing”from the module outline (available on blackboard).

Make sure that you have 3 sections called Task 1, Task 2, Task 3 and Task 4. Each subsections should

be also clearly labelled: 1a), 1b),...2a), 2b), 2c),...

The maximum number of words is 6000.

Information about coursework submission, penalty for late submission, policy for over-length work, procedure for coursework extensions, feedback and academic integrity and referencing can be found in module outline (available on blackboard). It is very important that you read carefully the module outline.

ASSIGNMENT

The target population consists of 1653 farms in Australia (file“OzFarm_Frame .xls”on blackboard). For each farm, you have: (i) ID for each farm, (ii) variable STATE, (iii) variable ZONE (iv) variable REGION (v) variable INDUSTRY and (vi) variable DSE. The description of these variables are

STATE:

1 New South Wales

2 Victoria

3 Queensland

4 South Australia

5 Western Australia

6 Tasmania

7 Northern Territory

ZONE:

1 Pastoral zone (inland)

2 Wheat-sheep zone (hinterland)

3 High rainfall zone (coastal)

REGION: Subdivision of State x Zone indicating a more homogeneous (in terms of climate, soil type etc.) farming area within a State. Three digits code, with first digit = state, second digit = zone and third digit denoting region.

INDUSTRY:

1 crops specialist farm

2 mixed livestock/crops farm

3 sheep farm

4 beef farm

5 sheep-beef farm

DSE: A measure of size of a farm in terms of its productive capacity. DSE stands for "Dry Sheep Equivalent" and is a linear combination of the reported numbers of sheep and beef cattle and hectares of crops area reported by the farm at the previous Agricultural Census.

TASK 1: (30%)

For this task, you need to use the size variable DSE to create strata.

1a) Create 4 strata using two different methods:

(i) the Dalenius and Hodges method (with classes of size 5000),

(ii) the cum() rule,

where the variable is the variable DSE. For each methods, present the details of you calculation and any analytic expressions needed.

[10%]

1b) Suppose you want to select a sample of size . What would be the optimal allocation (according to the variable DSE) for the 2 methods of stratification (i) and (ii) described in 1a)? Provide

the details of your calculation and any analytic expressions needed. [5%]

1c) Compute the variances of the mean of the variable DSE under the 2 methods of stratification (i) and (ii), when and under optimal allocation. Provide the formulae and details of your calculation.

Which stratification method would you recommend? [5%]

1d) What would be minimal sample size needed to achieve a coefficient of variation (CV) of 5% for the mean of the variable DSE, for the stratification obtained with the cum() rule and optimal allocation? What would be the minimal sample size if you use proportional allocation instead of optimal allocation? [10%]

TASK 2: (45%)

A stratified sample of units has been selected from the 1653 farms in Australia. The stratification is given by the variable ZONE (3 strata). The sample data can be found in the file “OzFarm_Sample .xls”on blackboard. You will see that the sample data contains additional variables:

TCC Total Cash Costs of farm over financial year

TCR Total Cash Receipts of farm over financial year (A$)

EQUITY Value of farm assets less farm debt at end of financial year (A$)

DEBT Farm debt at end of financial year (A$)

2a) Which allocation has been used? Explain your answer. [3%]

2b) Estimate the population mean of TCC. Provide the variances estimates and 95% confidence

intervals. Provide the formulae used and the details of you calculation. [5%]

2c) Estimate the population proportion of farms with DEBT < EQUITY. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and the details of you calculation. [5%]

2d) Estimate the population mean of TCC using the separate ratio estimator, with DSE as auxiliary variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and

the details of you calculation. [7%]

2e) Estimate the population mean of TCC using the combine ratio estimator, with DSE as auxiliary variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and

the details of you calculation. Compare your results with 2d). [10%]

2f) Estimate the population domain mean of TCR for the 5 types of INDUSTRY. Explain why you should use the combine ratio estimator. Compute the variance estimates and 95% confidence intervals. Provide the formulae used and the details of you calculation, for the first industry. Your final estimates,

variance estimates and confidence intervals should be given in a table. [10%]

TASK 3 (25%)

The sample dataset“Labor .xls”contains the following variables:

Cluster: cluster number

Person : person number

age : age of person

agecat : age category

1 : 19 years and under

2 : 20-24

3 : 25-34

4 : 35-64

5 : 65 years and over

race : 1 for non-black and 2 for black

sex : 1 for male and 2 for female

HourPerWk : usual number of hours worked per week

Wkly Wage : usual amount of weekly wages (in 1976 US$)

We suppose that these sample data have been selected with a two-stage sampling design. For both stages, simple random sampling has been used. The file“ClusterSize .xls”contains the (population) sizes of the cluster. We suppose that we have 2 000 000 individuals in the population, and that we have 30 000 clusters in the population. An electronic copy of“Labor .xls”and“ClusterSize .xls”is available on the module blackboard site.

3a) Estimate the population mean of weekly wage (per individuals). [5%]

3b) Compute the 95% confidence interval of the estimate found in 3a). [10%]

For 3a) and 3b): Provide details of your calculation. You should describe and justify the approach you used. You should provide the analytic expressions of the estimator and variance estimator used. You should also describe the key steps of your calculation.

TASK 4 (15%)

Suppose that we use a sampling design without replacement to select a sample of size from a population of size N. Let and denotes the first and second-order inclusion probabilities of the sampling design used. We suppose that and for all and . Let denote the value of a variable of interest for the individual of the sample . Consider the following estimator: