Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT6086 Sampling Techniques

Assignment 2022-23

You must submit one electronic copy of your report in PDF by 11.59pm on Tuesday 10th  January 2023. You must submit your electronic copy (a single file) via the STAT6093 Blackboard website using TurnItIn (in the Assignments folder). Your Student ID Number must appears in your electronic copy. Make sure that your assignment fits in a single PDF document. A scanned handwritten document is not allowed. Note that the file has to be smaller than 10 MB. If your Word file is larger than this, try converting it to a readable PDF or save your images (graphs, plots, …) in JPEG (instead of BMP).

It is the policy of the Department of Social Statistics that courseworks should be anonymous, therefore only your Student ID Number appears in your Word or PDF document. To maintain anonymity please do not put your name on any part of your submission. You must put your Student ID Number on the first page of your coursework.

Note that it is not acceptable that you read and gain ideas for your coursework from another student’s finished work. It is very important that you read carefully the Section “Academic Integrity and Referencing”from the module outline (available on blackboard).

Make sure that you have 3 sections called Task 1, Task 2, Task 3 and Task 4. Each subsections should

be also clearly labelled: 1a), 1b),...2a), 2b), 2c),...

The maximum number of words is 6000.

Information about coursework submission, penalty for late submission, policy for over-length work, procedure for coursework extensions, feedback and academic integrity and referencing can be found in module outline (available on blackboard). It is very important that you read carefully the module outline.

ASSIGNMENT

The target population consists of 1653 farms in Australia (file“OzFarm_Frame .xls”on blackboard). For each farm, you have: (i) ID for each farm, (ii) variable STATE, (iii) variable ZONE (iv) variable REGION (v) variable INDUSTRY and (vi) variable DSE. The description of these variables are

STATE:

1         New South Wales

2         Victoria

3         Queensland

4         South Australia

5         Western Australia

6         Tasmania

7         Northern Territory

ZONE:

1         Pastoral zone (inland)

2         Wheat-sheep zone (hinterland)

3         High rainfall zone (coastal)

REGION:  Subdivision of State x Zone indicating a more homogeneous (in terms of climate, soil type etc.) farming area within a State. Three digits code, with first digit = state, second digit = zone and third digit denoting region.

INDUSTRY:

1         crops specialist farm

2         mixed livestock/crops farm

3         sheep farm

4         beef farm

5         sheep-beef farm

DSE: A measure of size of a farm in terms of its productive capacity. DSE stands for "Dry Sheep         Equivalent" and is a linear combination of the reported numbers of sheep and beef cattle and hectares of crops area reported by the farm at the previous Agricultural Census.

TASK 1: (30%)

For this task, you need to use the size variable DSE to create strata.

1a) Create 4 strata using two different methods:

(i) the Dalenius and Hodges method (with classes of size 5000),

(ii) the cum() rule,

where the variable  is the variable DSE. For each methods, present the details of you calculation and any analytic expressions needed.

[10%]

1b) Suppose you want to select a sample of size  . What would be the optimal allocation         (according to the variable DSE) for the 2 methods of stratification (i) and (ii) described in 1a)? Provide

the details of your calculation and any analytic expressions needed.                                            [5%]

1c) Compute the variances of the mean of the variable DSE under the 2 methods of stratification (i) and (ii), when  and under optimal allocation. Provide the formulae and details of your calculation.

Which stratification method would you recommend?                                                                  [5%]

1d) What would be minimal sample size  needed to achieve a coefficient of variation (CV) of 5% for  the mean of the variable DSE, for the stratification obtained with the cum() rule and optimal          allocation? What would be the minimal sample size if you use proportional allocation instead of optimal allocation?                                                                                                                     [10%]

TASK 2: (45%)

A stratified sample of  units has been selected from the 1653 farms in Australia. The             stratification is given by the variable ZONE (3 strata). The sample data can be found in the file          “OzFarm_Sample .xls”on blackboard. You will see that the sample data contains additional variables:

TCC               Total Cash Costs of farm over financial year

TCR                Total Cash Receipts of farm over financial year (A$)

EQUITY        Value of farm assets less farm debt at end of financial year (A$)

DEBT             Farm debt at end of financial year (A$)

2a) Which allocation has been used? Explain your answer.                                                    [3%]

2b) Estimate the population mean of TCC. Provide the variances estimates and 95% confidence

intervals. Provide the formulae used and the details of you calculation.                                        [5%]

2c) Estimate the population proportion of farms with DEBT < EQUITY. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and the details of you calculation. [5%]

2d) Estimate the population mean of TCC using the separate ratio estimator, with DSE as auxiliary     variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and

the details of you calculation.                                                                                                    [7%]

2e) Estimate the population mean of TCC using the combine ratio estimator, with DSE as auxiliary     variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and

the details of you calculation. Compare your results with 2d).                                                     [10%]

2f) Estimate the population domain mean of TCR for the 5 types of INDUSTRY. Explain why you      should use the combine ratio estimator. Compute the variance estimates and 95% confidence intervals. Provide the formulae used and the details of you calculation, for the first industry. Your final estimates,

variance estimates and confidence intervals should be given in a table.                                        [10%]

TASK 3 (25%)

The sample dataset“Labor .xls”contains the following variables:

Cluster: cluster number

Person : person number

age : age of person

agecat : age category

1 : 19 years and under

2 : 20-24

3 : 25-34

4 : 35-64

5 : 65 years and over

race : 1 for non-black and 2 for black

sex : 1 for male and 2 for female

HourPerWk : usual number of hours worked per week

Wkly Wage : usual amount of weekly wages  (in 1976 US$)

We suppose that these sample data have been selected with a two-stage sampling design. For both stages, simple random sampling has been used. The file“ClusterSize .xls”contains the (population) sizes of the cluster. We suppose that we have 2 000 000 individuals in the population, and that we have  30 000 clusters in the population. An electronic copy of“Labor .xls”and“ClusterSize .xls”is available on the module blackboard site.

3a) Estimate the population mean of weekly wage (per individuals).    [5%]

3b) Compute the 95% confidence interval of the estimate found in 3a).     [10%]

For 3a) and 3b): Provide details of your calculation. You should describe and justify the approach you used. You should provide the analytic expressions of the estimator and variance estimator used. You should also describe the key steps of your calculation.

TASK 4 (15%)

Suppose that we use a sampling design without replacement to select a sample  of size  from a population of size N. Let  and   denotes the first and second-order inclusion probabilities of the sampling design used. We suppose that   and   for all  and  . Let  denote the value of a variable of interest for the individual  of the sample  . Consider the following estimator:

For which population parameter is this estimator an unbiased estimator? Justify your answer. [15%]