Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Understanding Data and Statistical Design (60117)

Lab 2: One factor, two level experiments

This lab is marked from 18.

Please submit in PDF format via Canvas.

Due by 23:59 Saturday 19th August 2023.

In this weekslab we look at a range of T-tests.

Question 1 [12 marks]

In this question we analyse the effect of fertilizer on the yield of a type of grass. The variables we consider are summarised in the table below.

Name

Type

Description

yld

feTt

continuous numerical factor

grass yield in 100lbs per acre

fertilizer quantity: 1 (none), 2 (low)

The sample data consists of 12 observations of yld for each level of feTt (data available in lab2a.csv on Canvas). Not much information is available with this data set, so we will

suppose that 24 plots were prepared in a homogenous manner. The 2 fertilizer concentrations were randomly assigned 12 times each across these 24 plots.

The statistical model for this experiment is

yldij  = μi  + Eij,         i  ∈ {1,2},        j {1,2, , 12},

where

•   yldij  is the yield from the j-th plot with feTt = i

•   μi  is the population mean yield with feTt  = i

•   Eij  is the random effect from the j-th plot with feTt = i.

The components of the experiment design:

.   factor – fertiliser quantity (variable feTt) with levels 1 (none) and 2 (low)

.   treatments – same as the factor levels as there is only 1 factor

.   experimental units – the 2 groups of 12 plots each allocated a treatment

.   measurement units – the 24 plots used in the experiment

.   response variable grass yield (variable yld).

First, we compute sample means of yld for each level of feTt (R output copied below).

1               2

96.08333  148.43333

ext, we compute sample standard deviations of yld for each level of feTt  (R output copied below).

1               2

35.92244   43.73857

(a) Construct  a  box  plot  for  yld  for  each  level  of feTt  and  display  in  a  single  chart. Compare the location and scale of the two samples [3 marks].

One sample T-test

To begin we just consider theyld measurements where no fertilizer has been used (feTt = 1).

(b) Using  significance  level   a  = 0.05,  perform  a   one  sample  T-test  to   determine  if (population) mean yield when no fertilizer is used is less than 11500lbs per acre. Write down the hypotheses, the test statistic and p-value, the test decision  (with reason) and conclusion (using a minimum of mathematical language) [3 marks].

Hint. Remember that yld is measured in 100lbs per acre.

(c) Using  significance  level   a  = 0.05,  perform  a   one  sample  T-test  to   determine  if (population) mean yield when no fertilizer is used is different to 8000lbs per acre. Write down the hypotheses, the test statistic and p-value, the test decision  (with reason) and conclusion (using minimum of mathematical language) [3 marks].

Power analysis for one sample T-test

The R codefile contains power analysis that is not assessed. The power of a hypothesis testis the probability of rejecting the null hypothesis when it is false. It is related to the probability of a Type II error (denoted β), which is the probability of retaining the null  hypothesis when it is false.

The other type of error that can be made is called Type I, which is rejecting the null hypothesis when it is true. The probability of this is the significance level a.

When designing an experiment, the experimenter sets the probabilities of Type I and II errors (a and β) and then calculates the sample size necessary to achieve these.

To perform the analysis for the test just performed in (b), we are going to assume that μ1  =  100 (power analysis requires us to set such a value). R returned the following output.

The power of the test was

Prob(Reject H0 |H0  is false) = 0.3868396

which is quite low.

This implies that the probability of a Type II error was

β = Prob(Type II error) = Prob(Retain H0 |H0  is false)

= 1 Prob(Reject H0 |H0  is false)

= 1 0.3868396 = 0.6131604

which is quite high.

In the second run of the analysis, we calculate the necessary sample size to obtain   power of 1 − β  = 0.8 (acommon level) in the test just performed in (b). Of course, changing the sample size would also change the test statistic.

We see that a sample size of n  = 37 (rounding up as sample size must be a whole number) would have been needed.

Two sample independent T-test

We now compare theyld measurements where no fertilizer has been used and where a low quantity of fertilizer has been used (feTt  = 1 against feTt  = 2).

(d) Using significance level a = 0.05, perform atwo sample independent upper tail T-test (assuming unequal variances) to determine if mean yield for low quantity fertiliser is more than 2000lbs per acre higher than for no fertiliser.  Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and conclusion (using a minimum of mathematical language) [3 marks].

The tests just performed rely on the data samples being normally distributed, as the sample sizes are too small to rely on the sample meansbeing normally distributed if the data itself is not normal. Next week we will look at how we can assess whether this assumption has been met.

QUESTION 2 [6 marks]

In this question we consider the time taken to perform a task under low and high noise conditions. The variables we consider are summarised in the table below.

Name

Type

Description

time

noise

continuous numerical factor

time taken to perform task in seconds background noise: 1 (low), 2 (high)

The sample data consists of 20 paired observations of time, one for each level of noise (data available in lab2b.csv on Canvas). In the experiment, 20 randomly selected individuals were asked to perform a task twice: once with low background noise and again with high background noise.

The statistical model for the experiment is

timeij  = μi  + Eij,         i  ∈ {1,2},        j = {1,2, ,20},

where

.   timeij  is the time taken by the j-th individual with noise  = i

.   μi  is the population meantime taken with noise  = i

.   Eij  is the random effect from the j-th individual with noise  = i.

The components of the experiment design:

.   factor – background noise (variable noise) with levels 1 (low) and 2 (high)

.   treatments – same as the factor levels as there is only 1 factor

.   experimental units – the group of 20 individuals used twice

.   measurement units – the 20 individuals used in the experiment

.   response variable time taken to perform task (variable time).

First, we compute sample means of time for each level of noise (R output copied below).

          1               2

46.4455       55.1035

Next, we compute sample standard deviations of time for each level of noise (R output copied below).

           1               2

5.196913   6.014407

(a) Construct a box plot for time for each level of noise and display in a single chart. Compare the location and scale of the two samples [3 marks].

Two sample paired T-test

(b) Using significance level a = 0.05, perform a two sample paired upper tail T-test to determine if mean time taken for high background noise is more than  5  seconds higher than for low noise.  Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and conclusion (using a minimum of mathematical language) [3 marks].