Understanding Data and Statistical Design (60117) Lab 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Understanding Data and Statistical Design (60117)
Lab 2: One factor, two level experiments
This lab is marked from 18.
Please submit in PDF format via Canvas.
Due by 23:59 Saturday 19th August 2023.
In this week’slab we look at a range of T-tests.
Question 1 [12 marks]
In this question we analyse the effect of fertilizer on the yield of a type of grass. The variables we consider are summarised in the table below.
Name |
Type |
Description |
yld feTt |
continuous numerical factor |
grass yield in 100lbs per acre fertilizer quantity: 1 (none), 2 (low) |
The sample data consists of 12 observations of yld for each level of feTt (data available in lab2a.csv on Canvas). Not much information is available with this data set, so we will
suppose that 24 plots were prepared in a homogenous manner. The 2 fertilizer concentrations were randomly assigned 12 times each across these 24 plots.
The statistical model for this experiment is
yldij = μi + Eij, i ∈ {1,2}, j ∈ {1,2, … , 12},
where
• yldij is the yield from the j-th plot with feTt = i
• μi is the population mean yield with feTt = i
• Eij is the random effect from the j-th plot with feTt = i.
The components of the experiment design:
. factor – fertiliser quantity (variable feTt) with levels 1 (none) and 2 (low)
. treatments – same as the factor levels as there is only 1 factor
. experimental units – the 2 groups of 12 plots each allocated a treatment
. measurement units – the 24 plots used in the experiment
. response variable – grass yield (variable yld).
First, we compute sample means of yld for each level of feTt (R output copied below).
1 2
96.08333 148.43333
ext, we compute sample standard deviations of yld for each level of feTt (R output copied below).
1 2
35.92244 43.73857
(a) Construct a box plot for yld for each level of feTt and display in a single chart. Compare the location and scale of the two samples [3 marks].
One sample T-test
To begin we just consider theyld measurements where no fertilizer has been used (feTt = 1).
(b) Using significance level a = 0.05, perform a one sample T-test to determine if (population) mean yield when no fertilizer is used is less than 11500lbs per acre. Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and conclusion (using a minimum of mathematical language) [3 marks].
Hint. Remember that yld is measured in 100lbs per acre.
(c) Using significance level a = 0.05, perform a one sample T-test to determine if (population) mean yield when no fertilizer is used is different to 8000lbs per acre. Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and conclusion (using minimum of mathematical language) [3 marks].
Power analysis for one sample T-test
The R codefile contains power analysis that is not assessed. The power of a hypothesis testis the probability of rejecting the null hypothesis when it is false. It is related to the probability of a Type II error (denoted β), which is the probability of retaining the null hypothesis when it is false.
The other type of error that can be made is called Type I, which is rejecting the null hypothesis when it is true. The probability of this is the significance level a.
When designing an experiment, the experimenter sets the probabilities of Type I and II errors (a and β) and then calculates the sample size necessary to achieve these.
To perform the analysis for the test just performed in (b), we are going to assume that μ1 = 100 (power analysis requires us to set such a value). R returned the following output.
The power of the test was
Prob(Reject H0 |H0 is false) = 0.3868396
which is quite low.
This implies that the probability of a Type II error was
β = Prob(Type II error) = Prob(Retain H0 |H0 is false)
= 1 − Prob(Reject H0 |H0 is false)
= 1 − 0.3868396 = 0.6131604
which is quite high.
In the second run of the analysis, we calculate the necessary sample size to obtain power of 1 − β = 0.8 (acommon level) in the test just performed in (b). Of course, changing the sample size would also change the test statistic.
We see that a sample size of n = 37 (rounding up as sample size must be a whole number) would have been needed.
Two sample independent T-test
We now compare theyld measurements where no fertilizer has been used and where a low quantity of fertilizer has been used (feTt = 1 against feTt = 2).
(d) Using significance level a = 0.05, perform atwo sample independent upper tail T-test (assuming unequal variances) to determine if mean yield for low quantity fertiliser is more than 2000lbs per acre higher than for no fertiliser. Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and conclusion (using a minimum of mathematical language) [3 marks].
The tests just performed rely on the data samples being normally distributed, as the sample sizes are too small to rely on the sample meansbeing normally distributed if the data itself is not normal. Next week we will look at how we can assess whether this assumption has been met.
QUESTION 2 [6 marks]
In this question we consider the time taken to perform a task under low and high noise conditions. The variables we consider are summarised in the table below.
Name |
Type |
Description |
time noise |
continuous numerical factor |
time taken to perform task in seconds background noise: 1 (low), 2 (high) |
The sample data consists of 20 paired observations of time, one for each level of noise (data available in lab2b.csv on Canvas). In the experiment, 20 randomly selected individuals were asked to perform a task twice: once with low background noise and again with high background noise.
The statistical model for the experiment is
timeij = μi + Eij, i ∈ {1,2}, j = ∈ {1,2, … ,20},
where
. timeij is the time taken by the j-th individual with noise = i
. μi is the population meantime taken with noise = i
. Eij is the random effect from the j-th individual with noise = i.
The components of the experiment design:
. factor – background noise (variable noise) with levels 1 (low) and 2 (high)
. treatments – same as the factor levels as there is only 1 factor
. experimental units – the group of 20 individuals used twice
. measurement units – the 20 individuals used in the experiment
. response variable – time taken to perform task (variable time).
First, we compute sample means of time for each level of noise (R output copied below).
1 2
46.4455 55.1035
Next, we compute sample standard deviations of time for each level of noise (R output copied below).
1 2
5.196913 6.014407
(a) Construct a box plot for time for each level of noise and display in a single chart. Compare the location and scale of the two samples [3 marks].
Two sample paired T-test
(b) Using significance level a = 0.05, perform a two sample paired upper tail T-test to determine if mean time taken for high background noise is more than 5 seconds higher than for low noise. Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and conclusion (using a minimum of mathematical language) [3 marks].
2023-08-23
One factor, two level experiments