STATS 240
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 240
Qu. 1. [12 marks] An experimenter was interested in the effect of adding sodium fluoride (NaF) to blood
samples that had been submitted for blood alcohol determination. There was speculation that
this process of “salting” caused higher blood alcohol readings. Six volunteer subjects were given
alcoholic drinks over a one-hour period to raise their blood alcohol concentrations to between .08%
and .10% W/V. Three tubes of blood were taken from each subject and 0, 5, or 10 mg/ml of sodium
fluoride was added to each tube. The blood alcohol concentration of each tube was then measured.
(a) Identify the following elements of this experiment:
(i) The response.
(ii) A treatment factor.
(iii) A blocking factor.
(iv) An experimental unit.
(b) Randomisation is one of Fisher’s three principles of experimental design. Explain how ran-
domisation should have been applied in this experiment.
(c) Suppose that (prior to running the experiment) the experimenter decided that they wanted
to increase the amount of replication. They propose that after the sodium chloride has been
added to each tube, the contents of the tube be divided into two parts and the blood alcohol
concentration measured on each part. Will this actually increase the number of times each
treatment is replicated? Explain your answer.
Qu. 2. [13 marks] A study was conducted to compare four methods of treating physical discomfort in
patients suffering from chronic asthma. The four methods were:
AL: drug A at a low dose, BL: drug B at a low dose,
AH: drug A at a high dose, BH: drug B at a high dose.
Notice that these 4 methods can be considered as a factorial arrangement of treatment factors drug
and dose.
The experimenter decided to use “time to relief” as the response (measured in minutes). Since
response to a drug often varies considerably from patient-to-patient, it was decided to use patients
as blocks. It was also thought that the order in which the patient receives the drugs may impact the
response. Therefore, a 4× 4 Latin square design was used and the following results were obtained:
Patients
Order 1 2 3 4
1 14 (AH) 30 (AL) 15 (BH) 28 (BL)
2 25 (AL) 15 (BH) 22 (BL) 21 (AH)
3 17 (BL) 19 (AH) 26 (AL) 22 (BH)
4 7 (BH) 22 (BL) 19 (AH) 34 (AL)
(a) The following ANOVA table for this data was produced using R.
Error: patient
Df Sum Sq Mean Sq F value Pr(>F)
Residuals [A] ****** 74.167
Error: order
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 3 [B] 1.1667
Error: patient:order
Df Sum Sq Mean Sq F value Pr(>F)
drug 1 ****** 100.00 24.00 0.0027137 **
dose 1 ****** 324.00 [C] 0.0001181 ***
drug:dose [D] ****** 9.00 2.16 0.1920392
Residuals [E] ****** 4.17
Calculate the values of A-E that are missing from the ANOVA table (show any working).
A= B=
C= D=
E=
(b) Tables of treatment means are:
> model.tables(comfort.aov,"means")
Tables of means
Grand mean
21
drug
drug
A B
23.5 18.5
dose
dose
H L
16.5 25.5
drug:dose
dose
drug H L
A 18.25 28.75
B 14.75 22.25
Based on the ANOVA table in (a) and the output from the model.tables what conclusions
do you make concerning each of the following (briefly justify your answers):
(i) The impact of the treatment factors on the response.
(ii) The usefulness of blocking on patient and of blocking on order.
(c) Suppose that the experiment had used twelve patients instead of four. The patients were
divided into three groups of four, and for each group a 4 × 4 Latin square was used to block
on both patient and order.
(i) Would the blocking structure for this design be squares*patient*order
(squares/patient)*order, (squares/order)*patient or squares/(patient*order)? Ex-
plain your answer.
(ii) For the blocking structure you chose in (i) write down the set of error strata that would
occur in the ANOVA table (assume that the blocking factors do not interact) and give
the Error degrees of freedom for each stratum. Note: you can get full marks for this
part even if you select the wrong structure in part (i).
Qu. 3. [13 marks] “Green manure” is a cover crop sown on an agricultural plot in order to fertilize the
soil for the following crop. The following data comes from an experiment that compared the effects
of four green manure crops (Fallow, Barley, Vetch, Barley plus Vetch) on the yields of sugar beets
(subsequently planted) at two levels of nitrogen fertiliser (none, 120 lb/acre).
green manure crop
Block nitrogen F B V BV
1 none 13.8 15.5 21.0 18.9
120 lb/acre 19.3 22.2 25.3 25.9
2 none 13.5 15.0 22.7 18.3
120 lb/acre 18.0 24.2 24.8 26.7
3 none 13.2 15.2 22.3 19.6
120 lb/acre 20.5 25.4 28.4 27.6
The following ANOVA table was produced using iNZight lite:
Error: Block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 2 7.866 3.933
Error: Block:Plot
Df Sum Sq Mean Sq F value Pr(>F)
Nitrogen 1 262.02 262.02 104.1 0.00947 **
Residuals 2 5.04 2.52
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Gmanure 3 215.26 71.75 118.96 3.43e-09 ***
Nitrogen:Gmanure 3 18.70 6.23 10.33 0.00121 **
Residuals 12 7.24 0.60
(a) It is clear from the ANOVA table that the experimenters did not use a completely randomised
design.
(i) What type of design was used for this experiment?
(ii) Describe the blocking structure for this design.
(iii) Describe the treatment structure for this design.
(iv) Explain how treatments were assigned to experimental units for this design.
(5 marks)
(b) The “tables of means” are:
Grand mean
20.72083
Nitrogen
0 120
17.417 24.025
Gmanure
B BV F V
19.583 22.833 16.383 24.083
Nitrogen:Gmanure
Gmanure
Nitrogen B BV F V
0 15.233 18.933 13.500 22.000
120 23.933 26.733 19.267 26.167
Values of the least significant difference (LSD) and Tukey’s studentised range (TSR) for com-
paring means from these tables (α = .05) are:
Nitrogen:Gmanure
Nitrogen Gmanure same Nitrogen level different Nitrogen levels
LSD 2.79 0.97 1.38 3.68
TSR 2.79 1.33 2.29 7.88
The standard formula for the TSR is: q ×√ResMS/reps.
In iNZight lite you need to supply values of df, ResMS, rep and means in addition to setting
alpha to .05. Fill in the values that would be used for each table of means.
For Nitrogen:
df= ResMS = reps = means =
For Gmanure:
df= ResMS = reps = means =
For Nitrogen:Gmanure same level of Nitrogen:
df= ResMS = reps = means =
For Nitrogen:Gmanure different level of Nitrogen:
df= ResMS = reps = means =
14
16
18
20
22
24
26
Nitrogen
m
ea
n
of
yie
ld
0 120
Gmanure
BV
V
B
F
(c) Write a short paragraph summarising how the two treatment factors affect the yield of sugar
beets. Support your conclusions – reference the ANOVA table, the tables of means and the
interaction plot given above as you see fit.
Qu. 4. [10 marks]
(a) If we divide the population into non-overlapping groups, and collect data from each group,
what sort of design have we used?
(b) List two reasons why a researcher might consider stratified sampling?
(c) What is being traded off in choosing cluster sampling compared with simple random sampling?
(d) (i) If we sample n = 100 people at random from a population that has N = 2000 people,
what weight should you give each person?
(ii) If we were to use this sample to estimate the mean number of hours worked per week for
people in the population, should we use a finite population correction?
Qu 4 cont.
(e) What is being corrected when we make “finite population corrections” and are they being made
bigger or smaller?
(f) Suppose we take a census of the entire population of families and estimate that the mean family
size is 2.97 people. What is the standard error of this estimate?
(g) In a simple random sample of sample size n taken from a population of size N , what is the
sampling fraction?
(h) What is a hexbin plot and how are such plots used for plotting survey data?
Qu 4 cont.
(i) What do smoothers on plots help reveal?
(j) Fill in the missing word.
When responses from units in the same cluster are highly correlated, standard errors for a
cluster sample are than standard errors for a simple random sample of the same
size?
(k) Write an equation to represent the relationship between the sample size (n), effective sample
size (ess), and design effect (deff) of a survey.
(l) If there are big differences between stratum variances, and also big differences in sampling costs
between strata, what type of strata do optimally allocated stratified sampling oversample?
Qu 4 cont.
(m) In a sample where a weight for each unit represents the inverse of the sampling fraction, what
estimate is represented by:
(i) the sum of weights across the sample?
(ii) the sum of the product of weights and values across the sample (e.g., values may be
number of cars owned; total income, etc.)?
(iii) the sum of the product of weights and values across the sample DIVIDED by the sum of
weights across the sample?
Qu. 5. [7 marks]
Question 5 relates to Andrew Sporle’s lectures.
This last sheet will be detached from the rest of your answers during marking so it is essential that
you supply your name and ID information again here:
(a) What is the benefit of pretesting survey questions?
(b) A forestry company wants to do a sample survey of trees in a very remote forest. The survey
involves cutting down the sampled trees and removing them by helicopter. List three (3) reasons
why the survey statistician who is designing the survey would be interested in designing the
most efficient survey possible?
(c) What type of missing data cannot be resolved by using imputation?
Qu 5 cont.
(d) Name a measure of socio-economic position that would be suitable for a study of health care
use by the elderly?
(e) Name a type of numerical rating scale commonly used in survey questions?
(f) A standardized interview schedule helps reduce what source or type of information bias?
2025-12-27