Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Biostatistics

Lab 5:  Biological pest control

Background

• ANOVA is used to determine whether the means of several groups differ

•  Post hoc comparisons via the Tukey-Kramer method are used to determine which groups are significantly different from each other without inflating the family-wise type I error

Objectives

•  Use diagnostic plots to determine whether conducting ANOVA is appropriate

•  Perform multiple post hoc comparisons while controlling for Type I error

Instructions

•  Launch RStudio, open the R Markdown template file biostatistics-labTemplate.Rmd and save it as lab5-yourLastName .Rmd

• Complete the activities outlined below by adding the appropriate commands in R code chunks

•  Embed your interpretation of the results below the relevant R code chunks

Code Examples

R has several datasets that are available internally (i.e.  without needing to download them from any- where).  Today, we’re going to use one of these datasets to investigate running ANOVAs.  Begin by loading the mtcars”dataset and looking at the first few rows:

data(mtcars)

head(mtcars)

#                                       mpg  cyl  disp    hp  drat        wt    qsec  vs  am  gear  carb

#  Mazda  RX4                  21 .0      6    160  110  3 .90  2 .620  16 .46    0    1        4        4

#  Mazda  RX4  Wag          21 .0      6    160  110  3 .90  2 .875  17 .02    0    1        4        4

#  Datsun  710                22 .8      4    108    93  3 .85  2 .320  18 .61    1    1        4        1

#  Hornet  4  Drive        21 .4      6    258  110  3 .08  3 .215  19 .44    1    0        3        1

#  Hornet  Sportabout  18 .7      8    360  175  3 .15  3 .440  17 .02    0    0        3        2

#  Valiant                      18 .1      6    225  105  2 .76  3 .460  20 .22    1    0        3        1

Then, we’re going to visually inspect the miles per gallon for car models that have 4 cylinders versus cars that have 8 cylinders:

par(mfrow  =  c(1 ,  1))

hist(mtcars$mpg[mtcars$cyl  ==  4],  xlab  =  "Miles  Per  Gallon" ,

main  =  "MPG  by  Number  of  Cylinders" ,  col  =  adjustcolor("darkblue" ,

alpha .f  =  0.3),  xlim  =  c(10 ,  34),  ylim  =  c(0 ,  4),

breaks  =  10)

hist(mtcars$mpg[mtcars$cyl  ==  8],  col  =  adjustcolor("darkgreen" ,

alpha .f  =  0.3),  breaks  =  10 ,  add  =  T)

legend("topright" ,  legend  =  c("4  cylinders" ,  "8  cylinders"),

col  =  c(adjustcolor("darkblue" ,  alpha .f  =  0.3),  adjustcolor("darkgreen" ,

alpha .f  =  0.3)),  pch  =  20 ,  pt .cex  =  2)

MPG by Number of Cylinders

  4 cylinders   8 cylinders

 

10         15         20         25         30

Miles Per Gallon

It looks like there’s a clear difference between the two groups, but we can test that using the aov function.  It is often helpful to save the model as its own object, from which you have many further options (like using summary to look at the statistical significance or pulling out the coefficients).

#  Keep  only  data  to  be  analyzed  in  ANOVA

cars4 .8  <- mtcars[mtcars$cyl  ==  4  | mtcars$cyl  ==  8,  ]

mpg .aov  <-  aov(mpg  ~  cyl,  data  =  cars4 .8)

summary(mpg .aov)

#                         Df  Sum  Sq Mean  Sq  F  value      Pr(>F)

#  cyl                    1    823 .7      823 .7      65 .65  3 .45e-08  ***

#  Residuals      23    288 .6        12 .5

#  ---

#  Signif .  codes:    0  '***'  0 .001  '**'  0 .01  '*'  0 .05  ' . '  0 .1  '  '  1

#  str(mpg .aov)  # Use  this  to  look  at  all  the

#  attributes  of  the model

mpg .aov$coefficients

#  (Intercept) #      38 .227273

cyl

-2 .890909

Remember that an ANOVA allows for the comparison of more than two groups, so we can easily change the data back to the full dataset to evaluate all the cars!

mpg .aov .all  <-  aov(mpg  ~  cyl,  data  = mtcars)

summary(mpg .aov .all)

#                         Df  Sum  Sq Mean  Sq  F  value      Pr(>F)

#  cyl                    1    817 .7      817 .7      79 .56  6 .11e-10  ***

#  Residuals      30    308 .3        10 .3

#  ---

#  Signif .  codes:    0  '***'  0 .001  '**'  0 .01  '*'  0 .05  ' . '  0 .1  '  '  1

Activities

You are the chief scientist at a  biotech company that is attempting to develop safe and effective products to combat agricultural pests, which cost the US economy millions of dollars annually.  Your area of expertise is biological control via the use of bacteria that target pests without affecting other beneficial insects.   The development of such effective biological control agents would be invaluable because they could potentially replace dangerous pesticides that kill indiscriminately and have a tendency to accumulate as they travel up to food chain, thus potentially harming humans.

Task 1:  Identifying promising biological control agents

After two years of active research, your team has developed four biological control agents in the form of bacteria that target distinct biochemical pathways in pests controlling reproduction, metabolism, and immunity.  To test their relative effectiveness, you decide to run an experiment whose results can be downloaded from the web:

#  Download  the  data

d1  <-  read .csv("http://faraway .neu .edu/biostats/lab5_dataset1 .csv")

1. To determine whether the biological control agent was successful in increasing the mortality of the pest, you decide to conduct an ANOVA. Formulate the null and alternative hypotheses.

2.  Before conducting your analysis, plot the data to determine whether it adheres to the assumptions of ANOVA. Specifically, use function hist to determine whether mortality is normally distributed. Does the distribution of mortality seem normal?

3.  If the data are not normally distributed, a log10  transformation can sometimes help.   Plot a histogram of log10-transformed and non-transformed pest mortality on two separate panels.  Is a transformation necessary in this case? When doing further analysis, would you use the raw data or the transformed data?

4.  Now you decide to run an ANOVA to determine whether the mean mortality differed between your treatment groups.  In R, ANOVA models are fit in the following way mod .aov=aov(y  ∼ x,  data=d), where y is the response variable in dataset d, x is the explanatory or grouping variable in dataset d, and mod is the variable in which the ANOVA aov object is stored. Starting with this example,  replace x, y, mod, and d with the variables of interest in each case.   To extract different parts of the mod object, R provides a number of important utility functions such as summary(mod), which provides the standard ANOVA table, coef(mod) which provides the coefficients, and resid(mod) which extracts the residuals. Other components of aov objects can be accessed directly via the $ operator (look at str(mod) for a list of these objects and consult the help file for additional details via ?aov).  Use aov to fit an ANOVA model to your data and use the summary function to display the ANOVA table.

5.  Use the ANOVA table to determine whether there is a difference in the mean mortality of the pest across the different groups.

6. To determine which groups have significantly different means, you decide to run post-hoc pairwise comparisons using the Tukey-Kramer method. Why not simply use t-tests between each pair of means?

7.  Use function TukeyHSD to perform the post-hoc pairwise comparisons on the aov object and print the results to the screen. The p  adj column contains the adjusted p-values for the comparisons described in the first column.

8. To visualize these differences, use the plot function to create a figure of the TukeyHSD object and interpret the results.

9. To improve upon the first figure, you decide to use a barplot to display the means and standard errors of mortality for each group, along with labels over each bar to identify means that are significantly different. Because the process of labeling each bar can be time-consuming and error- prone when dealing with many groups, download and use the package multcompView by issuing the following commands: install .packages("multcompView");  require(multcompView)

10.  First, determine what labels go on each bar to distinguish which values are significantly different. To do this, you will have to use the multcompLetters function.   Extract appropriate letters can be found in the TukeyHSD object you created earlier.  In this case, the component would be mod .tukey$treatment[,  "p  adj"], if you named your TukeyHSD object mod .tukey.  Modify the following example to work with your objects.  Then, re-order the labels (control, t1, t2, t2, and then t4).

labels  <- multcompLetters(mod .tukey$treatment[,  "p  adj"])$Letters

11.  Now, use function aggregate to compute the mean and standard error of pest mortality for each group, and plot the results using a barplot.  Don’t forget to add the text labels that you worked so hard to create in the previous step. Start with this template for making the barplot:

#  Compute means

means  <-  aggregate(XXXX  ~  XXXX, mean,  data  =  XXXX)

#  Compute  standard  errors

stderrs  <-  aggregate(XXXX  ~  XXXX,  FUN  =  function(x)  XXXX,

data  =  XXXX)

#  Create  barplot  figure

bp  <-  barplot(XXXX,  names  =  XXXX,  ylab  =  "Mean mortality  of  pest" ,

ylim  =  range(XXXX))

#  Add  standard  error  bars

arrows(x0  =  XXXX,  y0  =  XXXX,  y1  =  XXXX,  code  =  3 ,  angle  =  90 ,

length  =  0.1)

box()

#  Add  text  labels

text(x  =  XXXX,  y  =  XXXX,  XXXX,  pos  =  3)

12.  Describe the results from the figure. Specifically, which treatment seems most effective in increas- ing pest mortality and why?

Task 2:  Mitigating impacts on pest enemies

Although we have found some promising biological control agents, we must ensure that they do not interact antagonistically with natural enemies of pests such as predators. Indeed, if the biological control agents prevent the predators from consuming the pests, the net effect of the biological control agent could be either neutral, if the gains in pest mortality caused by the biological control agent are offset by the losses in mortality due to lower pest predation, or even negative! To ensure that the biological control agents do not affect pest mortality rates due to natural predators, you conduct a second experiment in which you place the the predators of the pests in isolated chambers and expose them to each treatment (control, t1-t4). Download the data from the web:

#  Download  the  data

d2  <-  read .csv("http://faraway .neu .edu/biostats/lab5_dataset2 .csv")

1. To ensure that the biological control agent does not inadvertently reduce the effectiveness of natural predators of the pests, you decide to fit an ANOVA model to the data (via function aov). Does the ANOVA table suggest that the mean mortality is different across the treatments?

2. To determine which groups are different from each other in terms of mean pest mortality, perform post-hoc pairwise comparisons using the Tukey-Kramer method by following the procedure outlined above.

3. Then, plot the mean and standard error of pest mortality for each group using a barplot and assign the appropriate label to each bar based on statistical significance.  Note:  use the same workflow as for the first labeled barplot.

4.  Describe the results and determine which treatment groups experienced the lowest rates of mor- tality for the enemy of the pest.

5.  Based on these results and those obtained in the last section, what treatment or biological control agent would you recommend and why?