Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT0024A5UC, STAT0024A5UD

STAT0024 - Social Statistics

2022

Question 1

In each of (a) and (b) below, a sampling scheme is described.  In each case name the type of scheme described.  For each scheme state one advantage and suggest one potential problem.

(a) [Type] To sample from sequentially numbered invoices in a digital archive an auditor chooses a   random starting point amongst the first 10 numbers in the list, then takes this invoice and every 10th invoice thereafter until the required sample size is achieved.  Word limit: 100 words. [3]

(b) [Type] To recruit participants for an email survey on student attitudes to on-line learning a            teacher emails the survey to the 8 students in her tutorial class. The email asks the students to          complete a questionnaire and then to forward a copy of the email to as many as possible of their      student friends, asking them in turn to complete the questionnaire and recruit some of their friends. Word limit: 100 words. [3]

Question 2

The following questions are intended to form part of a questionnaire designed to assess the               attitudes of the inhabitants of a city to the imposition of a zero emissions zone (ZEZ) for traffic in the city centre.

Q1.  In view of the serious damage caused to people’s health by vehicle emissions and the need to reduce our dependence on oil in the interests of saving the planet, the ZEZ should be implemented as soon as possible.

Q2.  Exemptions should be granted for refuse trucks and other council vehicles and for emergency services.

Q3.  It is not a good idea not to allow deliveries by petrol vehicles to city centre shops in the early mornings.

Each question will have the same choices of response:

strongly disagree/disagree/neutral/agree/strongly agree

(a) [Type] Criticise each of the 3 questions and provide a better wording in each case, explaining why they ought to be changed and writing out your new questions in full.  The new versions must still use the 5-category scale above for the response. Word limit: 200 words. [9]

(b) [Type] The responses are to be converted to a numerical scale from 1 to 5, and the results             averaged over questions to give an overall score for each respondent.  Suggest, taking account of       their polarity, a coding for each of your rewritten questions.  Using your coding, what attitude does a high overall score correspond to? Word limit: 150 words. [4]

Question 3

A researcher wishes to take a simple random sample of size n from a finite population of size            N=1000 in order to estimate the population mean of the variable Y.  They would like the sample to  be large enough for the eventual 95% confidence interval for the population mean to have width 2. Pressed to have a guess at the population standard deviation S of Y, the researcher is only prepared to say it is probably between 2 and 10.

(a)  Calculate the range of sample sizes implied by this range of population standard deviations. Explain the steps in your argument, and define the notation in any formula you quote.  [6]

(b)  [Type] Having done this calculation what would you say to the researcher? Word limit: 100 words. [2]

Question 4

After taking a one-stage cluster sample, the three sampled clusters have sample means 33, 42, 50 and sizes 50, 15 and 35.

(a)  Suppose the clusters were chosen by simple random sampling from the set of all clusters.  Quote a suitable formula, defining your notation, and use it to estimate the population mean. [4]

(b)  Now suppose the clusters were sampled randomly but with probability proportional to size.    State an alternative estimator that is suitable for this case and use this to estimate the population mean. [3]

(c) [Type] Discuss briefly the unbiasedness or otherwise of the estimators in (a) and (b) above under their respective sampling schemes. Word limit: 150 words. [4]

Question 5

A school wishes to assess the average mathematical ability of the pupils in a particular year using a written test.  There are 5 classes in the year, each with 30 pupils.  To reduce the amount of effort   involved in marking the tests it is proposed that only a sample of 60 pupils will take the test.  Three different sampling schemes are under consideration: simple random sampling, stratified random    sampling with classes as strata, and one-stage cluster sampling with classes as clusters.

(a) [Type] Explain how the samples would be taken under each of these three schemes.  Word limit:

70 words. [3]

Discuss the relative merits of the three schemes under each of the following scenarios.

(b) [Type] The classes have been streamed on the basis of last year's exam results, with the 30 best  scoring pupils in the first class, the next 30 in the second class and so on.  Word limit: 150 words. [4]

(c) [Type] The classes have been formed so that each class contains a full range of abilities. Word limit: 150 words. [4]

(d) [Type] The pupils have been randomly assigned to classes. Word limit: 150 words. [4]

Question 6

A university located in a city wishes to estimate what proportion of the 1000 students in its 4 halls of residence regularly walk to the campus for their classes.  The halls are at very different distances       from the campus, so that the proportions are likely to differ between halls.  The table below gives     the number of students in each hall and a guess at the likely proportions of walkers.

Hall

A

B

C

D

Number of students

400

300

100

200

Guess at proportion of walkers

0.9

0.8

0.5

0.2

It is decided to take a sample of 100 students using stratified random sampling with the halls as strata.

(a) Defining any notation you use, explain how Neyman allocation would divide this sample of 100  between the 4 halls and calculate the numbers to be sampled from each hall under this scheme.  In what circumstances is this the optimal allocation?  [6]

(b) Defining any further notation you use, write down the formula for the usual estimate of the   population proportion when using stratified random sampling.  You are not required to compute anything for this example.  [2]

(c) Assuming for this purpose that the guessed proportions are correct, use the data in the table above to calculate the variance of the estimator in (b) under Neyman allocation.  [4]

(d) Calculate the sample sizes for proportional allocation, compare them with those for Neyman allocation and comment on the differences.  [4]

(e) Using the data in the table above calculate the variance of the estimator in (b) under proportional allocation and comment on how it compares with the variance for Neyman allocation. [6]

Question 7

In order to estimate the average household income in a town, households in a simple random             sample of 500 were asked to report their income.  Only 400 of these households provided the             requested information.  As part of the data analysis the table below was produced.  For each of the 4 areas into which the town may be divided it shows the numbers of responding and non-responding   households as well as the mean income of the responding households in thousands of pounds.

Area

Number of

households

responding

Number of        households       not responding

Number of     households in sample

Mean income of responding         households (£k)

A

100

20

120

40

B

80

10

90

30

C

100

10

110

35

D

120

60

180

50

Totals

400

100

500

 

(a)  Compute the response rate for each area and comment.  [4]

(b)  Without doing any further calculations or going into great detail, suggest a method that could be used to examine the strength of any evidence for real differences in response rates between areas.   [2]

(c)  Estimate the average household income in the town using a complete case analysis.  Explain carefully the steps in your calculation.  [4]

(d)  Estimate the average household income in the town using inverse probability weighting to correct for the varying response rates in the different areas.  Explain carefully the steps in your calculation.  [4]

(e)  Compare the results in (c) and (d) and explain how any difference arises.  [3]

(f)  [Type] Under what conditions on the missing data mechanism are each of the analyses in part (c) and part (d) valid?  Which of the two analyses would you prefer, and why?  Discuss whether your      preferred analysis is likely to correct for all the bias induced by the missing responses.  For full marks here the answer needs to relate the conditions to this particular example rather than simply               repeating generalities from the notes.  Word limit: 400 words. [8]