Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EVOLUTIONARY BIOLOGY, BIOL 20212

COURSEWORK– NATURAL SELECTION IN THE WILD


Peter, Rosemary Grant and co-workers performed a classic study on natural selection and evolution in wild populations by studying Darwin’s finches in the Galápagos archipelago. Almost all of the medium ground finches Geospiza fortis on the islet of Daphne Major were marked and measured over many years. The Galápagos islands are subject to extreme weather fluctuations, resulting in extremely wet (El Niño) and extremely dry (La Niña) weather conditions. The Grants analysed quantitative responses of birds to these extreme weather events, which cause population crashes and population booms respectively. In a way, the extreme conditions bring about ‘natural experiments’ by which scientists can study microevolutionary change in action. The Grants’ observations on patterns of morphological change were backed up with evidence of mechanisms responsible for change – they measured forces necessary to crack seeds of a particular size, and related these forces to selection on bill morphology. Their study shows the value of long-term studies under natural conditions in understanding evolutionary principles.

The famous evolutionary biologist J.B.S. Haldane once said 'No scientific theory is worth anything unless it enables us to predict something which is actually going on. Until that is done, theories are a mere game of words, and not such a good game as poetry’. The Grants’ study allows us to understand why selection brings about morphological change in populations, and because we understand the mechanisms that bring about the change, allows us to predict what changes may happen in future climatic events.

The data file that you have been given (Darwinsfinches.csv – available on Blackboard) contains a subset of data from the Darwin’s finch study. Data on bill depths are provided from Daphne birds before and after a La Niña -induced population crash, and birds from Santa Cruz 10 km distant. The aim of this practical is to use the data set to illustrate the four postulates of evolution by natural selection.

In statistical tests cite the test used, parameter values tested (e.g. means), a measure of variability (e.g. SDs), the test statistic and the degrees of freedom.

You have been provided with a comma separated values (csv) file – darwinsfinches.csv containing the following columns

daphne_all

bill depths of all birds from Daphne Major (mm)

survive

Boolean values (1 or 0) indicating whether the bird survived the event (1 for survived, 0 for died)

santa_all

bill depths of all birds from Santa Cruz (mm)

offspring

offspring bill depths (mm)

mother

mother bill depths (mm)

father

father bill depths (mm)

dead

bill depths of all birds that died on Daphne Major (mm)

alive

bill depths of all birds that survived on Daphne Major (mm)

both islands

bill depths of all birds from Daphne Major and Santa Cruz (mm)

location

Boolean values (1 or 0) indicating whether the bird is from Daphne Major (0) or Santa Cruz (1)

Setting up analysis

You need to read in the data and refactor it slightly so all analyses will run smoothly

You need to set your working directory to where the data files are.

setwd("/PATH/TO/DATAFILES/”)

Alternatively, use the ‘Session’ command as follows:

1. Create a sub-directory named “R” in your “Documents” folder, and place the csv data file in it.

2. From RStudio, use the menu to change your working directory under Session > Set Working Directory > Choose Directory.

3. Choose the directory you’ve just created in step 1

Now read in the data

data <- read.table('darwinsfinches.csv', header = TRUE, sep = ',')

head(data)

str(data)

Finally, we need to make the survive and location columns a factor, because R currently thinks they are numeric.

data$survive <- factor(data$survive , levels=0:1, labels=c("Dead", "Alive"))

data$location <- factor(data$location , levels=0:1, labels=c("Daphne", "Santa"))

str(data)

You’re ready to go!

Postulate 1. Individuals within species are variable.

Produce histograms that classify the bill depths (Daphne Island (non-survivors and survivors done separately) and Santa Cruz island) into classes that show relatively fine resolution. Calculate means and SDs for each of the 3 distributions and present these in a Table.

Box 1.

Descriptive statistics, and producing Histograms in R

Descriptive statistics. To calculate the means and standard deviations for alive and dead birds on Daphne Major and all birds on Santa Cruz you can use a function called “tapply”, which does a similar thing to ‘function aggregate’ that you may already have met.  The syntax is as follows

tapply(dependent_variable, grouping_variable, function, na.rm)

The dependent variable is what we measure (bill depth). The grouping variable is how we separate our data (either survive or location). The function is what we want to do to the data. Finally na.rm will any rows featuring “NA” from the analysis if set to TRUE (we will always be setting it to TRUE).

Thus, if we want to calculate the mean, and standard deviation, bill depth for alive and dead birds on Daphne Major, we would use the following.

tapply(data$daphne_all, data$survive, sd, na.rm = TRUE)

Similarly, for the birds on Santa Cruz and Daphne Major

tapply(data$both_islands, data$location, mean, na.rm = TRUE)

tapply(data$both_islands, data$location, sd, na.rm = TRUE)

Histograms. Plot three histograms to show the distribution of bill depths of alive birds on Daphne Major, dead birds on Daphne Major, and birds on Santa Cruz. These correspond to the three columns in your data which you can derive using the table above.

To make your histograms comparable, you need to standardise the x-axes. To calculate your minimum and maximum x-values, you can use the functions max() and min() on the “both_islands” column of the table.

min_x <- min(data$both_islands - 2)

max_x <- max(data$both_islands + 1)

You can tell R to plot all three histograms in one figure with the following, which will plot the histograms on top of each other,

par(mfrow = c(3,1))

hist(data$alive, xlim = c(min_x, max_x), xlab = "", main = " ")</p><p class="MsoNormal">hist(data$dead, xlim = c(min_x, max_x), xlab = "<x label="">", main = "<title>")</p><p class="MsoNormal">hist(data$santa_all, xlim = c(min_x, max_x), xlab = "<x label="">", main = "<title>")</p><p class="MsoNormal">Replace the words in between chevrons with your desired x labels/titles and remove the chevrons. You can modify your histograms in as many other ways as you like. To find out how, you can run “?hist” in the RStudio console. Your histograms should be in grey scale with standardised x-axes and should be of publication quality.</p><p class="MsoNormal"><b>Postulate 2. Some of the variations are passed on to offspring.</b> </p><p class="MsoNormal">Determine mid-parent values for bill depth, for each offspring with parent values. Plot this on the <i>x</i>-axis (independent variable) against the offspring bill depth on the <i>y</i>-axis (dependent variable). Fit a regression line to these data, and determine if there is a significant relationship (cite the relevant statistics). <b>The slope of the regression line is the statistic h</b><b><sup>2</sup></b><b>, which is a measure of genetic resemblance between parents and offspring</b>. h<sup>2</sup> measures what we term narrow-sense<sup> </sup>heritability. It is solely a measure of how much variation is due to the additive effect of genes, and it is this that allows us to predict how a population responds to selection. A value of 1 suggests that the additive effects of genes cause all variation, 0 suggests that none of it is. What does your analysis suggest?<b style="text-align:center;"> </b> </p><p class="MsoNormal" align="center" style="text-align:center;"><b><i>Box 2.</i></b><b><i></i></b> </p><p class="MsoNormal"><i>Regression analysis</i> </p><p class="MsoNormal"><i>Calculating mid-parent-values</i> </p><p class="MsoNormal">The mid parent value is the mean bill depth of both parents of the finch in question. Thus, we can add a column to our data frame with this information with the following command.</p><p class="MsoNormal">data$mid <- (data$mother + data$father)/2</p><p class="MsoNormal"><i>Regressions in R</i> </p><p class="MsoNormal">First, we need to create a linear model to describe the relationship between mid-parent bill depth and offspring bill depth. This can be done with the following commands.</p><p class="MsoNormal">First specify the model</p><p class="MsoNormal">lin_mod <- lm(data$offspring ~ data$mid)</p><p class="MsoNormal">You have been taught that there can be issues using $ when specifying linear models because of potential problems with post-hoc tests: alternative syntax that avoids these problems would therefore be</p><p class="MsoNormal">lin_mod <- lm (offspring ~ mid, data = data)</p><p class="MsoNormal">Have a look at the model output</p><p class="MsoNormal">summary(lin_mod)</p><p class="MsoNormal">Finally, specify your plot window and plot, adding the regression line</p><p class="MsoNormal">par(mfrow = c(1,1))</p><p class="MsoNormal">plot(data$mid, data$offspring,</p><p class="MsoNormal">main = "<title>",</p><p class="MsoNormal">xlab = "<xlabel>", ylab = "<ylabel>")</p><p class="MsoNormal">abline(lin_mod)</p><p class="MsoNormal">Don’t forget, you ought to comment on whether your results are statistically meaningful. You can do this using an analysis of variance (ANOVA).</p><p class="MsoNormal">anova(lin_mod)</p><p class="MsoNormal">From here you can find the test statistic (F-value), degrees of freedom and P-value.</p><p class="MsoNormal"><b>3. In most generations, more offspring are produced that can survive</b>.</p><p class="MsoNormal">This postulate is easily met – many of the offspring produced died in the population crash.</p><p class="MsoNormal" align="center" style="text-align:center;"><b><i>Box 3.</i></b><b><i></i></b> </p><p class="MsoNormal"><i>Calculating the mortality rate.</i> </p><p class="MsoNormal">Here, we will simply use R as a calculator – no statistical tests are necessary.</p><p class="MsoNormal">The following commands separate alive and dead into a table and calculate the mortality rate.</p><p class="MsoNormal">survival <- table(data$survive)</p><p class="MsoNormal">mortality_rate <- survival[1]/(survival[1] + survival[2]) * 100</p><p class="MsoNormal">names(mortality_rate) <- NULL</p><p class="MsoNormal">mortality_rate</p><p class="MsoNormal"><b>4. Survival and reproduction are not random: individuals with the highest reproductive success or survival are those with the most favourable variations – they are ‘naturally selected’.</b> </p><p class="MsoNormal">Test for differences in bill depth of Daphne birds that survived the population crash versus those that died. Are particular birds favoured by natural selection? Remember that seeds became scarce during the drought, but among these large, hard seeds were disproportionately common. Why were particular bill morphologies selected?  What sort of natural selection has occurred?</p><p class="MsoNormal">Now do some manual calculations with the values you have calculated in R.</p><p class="MsoNormal">1) calculate the strength of selection (S),</p><p class="MsoNormal">S = t*-t</p><p class="MsoNormal">Where t* is the mean bill depth of survivors, and t is the mean bill depth of the entire population.</p><p class="MsoNormal"><img src="/Uploads/20231207/65717f1507748.png" data-ke-src="/Uploads/20231207/65717f1507748.png" alt="" /> </p><p class="MsoNormal">2) Remember <b>natural selection produces descent with modification, i.e. evolution</b>. <b>Natural selection acts within a generation, evolution occurs between generations.</b> Evolution is a <i>response</i> to selection (R), and calculate this response as</p><p class="MsoNormal">R = h<sup>2</sup>S</p><p class="MsoNormal"><img src="/Uploads/20231207/65717f2277990.png" data-ke-src="/Uploads/20231207/65717f2277990.png" alt="" /> </p><p class="MsoNormal">Note that the R statistic includes a measure of the heritability of a trait, and the strength of selection on that trait (see graphs above). What does R mean in terms of changes in bill depths in the next generation? By this stage, you have estimated how much of the variation in a trait is due to variation in genes, you have quantified the strength of selection that results from differences in survival, and you have combined these to predict how the population will change from one generation to the next. You have witnessed microevolution in action!</p><p class="MsoNormal" align="center" style="text-align:center;"><b>Box 4.</b><b> </b> </p><p class="MsoNormal"><i>Testing for differences in R</i> </p><p class="MsoNormal">First explore whether your data are normally distributed.  There are several ways of doing this – you have learnt about using QQ Plots already, and here we will use a Shapiro-Wilks test on the bill depths of alive and dead birds.</p><p class="MsoNormal">shapiro.test(data$alive)</p><p class="MsoNormal">shapiro.test(data$dead)</p><p class="MsoNormal">The Shapiro-Wilks test tells us whether the data deviate from a normal distribution. Thus, if the P value < 0.05, we can infer that the data are not normally distributed. What we do next depends on whether the data are normally distributed. If they are, we can use the parametric two-sample t-test. You have been introduced to linear models, and a t-test is one commonly used and simple form of linear model for testing for differences between two samples.</p><p class="MsoNormal">For the t-test, we can run the following.</p><p class="MsoNormal">t.test(daphne_all~survive, data = data, var.equal = T)</p><p class="MsoNormal">This tests for differences in the two groups, assuming equal variances across groups. The output will give you the test statistic (t), degrees of freedom and P-value. It also gives the mean value for each group. If you analysed the data using a linear model approach, you would get an F statistic as your test statistic, and t is the square root of F.</p><p class="MsoNormal">Otherwise, if the data do not fit a normal distribution, you should use the non-parametric Mann-Whitney U test.</p><p class="MsoNormal">wilcox.test(daphne_all~survive, data = data)</p><p class="MsoNormal">table(data$survive)</p><p class="MsoNormal">This command runs a Mann-Whitney test (which is sometimes known as a Wilcoxon test, hence the function name). You should see output with the test statistic (U) and P-value. You also need to report your sample sizes with this test, which the second command will present to you.</p><p class="MsoNormal"><b>Additional exercises</b> </p><p class="MsoNormal" style="margin-left:18.0000pt;text-indent:-18.0000pt;">1. Compare differences in mean bill depth between all birds on Daphne and Santa Cruz statistically. Why might this difference exist?  On Santa Cruz <i>G. fortis</i> coexists with a smaller species, <i>G. fuliginosa</i>, and how might this affect bill morphology in <i>G. fortis</i>?</p><p class="MsoNormal" style="margin-left:18.0000pt;">Complete the calculations using R. Remember how we compared differences between the dead and alive finches on Daphne? You need to do this for all birds on Daphne vs Santa Cruz. Remember the data-frame includes the columns “both_islands” and “location”, which you will need for this exercise. Test for differences too.</p><p class="MsoNormal" style="margin-left:18.0000pt;text-indent:-18.0000pt;">2. Determine the coefficient of variation (SD/ mean) x100 for all birds and survivors on Daphne. The coefficient of variation expresses the variation as a percentage of the mean, and hence gives a normalised measure of dispersion. What do the coefficients of variation and the distributions of trait sizes (i.e. bill depths) tell you about the type(s) of natural selection acting during the population crash?</p><p class="MsoNormal" style="margin-left:18.0000pt;">You should use the functions mean() and sd() to calculate these numbers. Don’t forget the data frame has columns “daphne_all” and “alive” with all the data you need.</p><p class="MsoNormal">Hand in an account of: <i>How can data on changes in bill morphology of Darwin’s finches during a population crash be used to illustrate the four postulates of evolution by natural selection and microevolutionary change?</i> Also answer the questions in the ‘additional exercises’.</p><p class="MsoNormal">You should include only histograms for Postulate 1 and the regression plot for Postulate 2 as figures, with a table of means and standard deviations of bill depths as described for Postulate 1. No other figures or tables are necessary. The report must be 1600 words maximum in length (excluding figures, legends and bibliography), and detailed formatting instructions will be provided.  References should be presented Harvard style. The report should also include an Abstract of less than150 words. I suggest that you write it up as Title, Abstract, a brief Introduction then work through the postulates and then the additional exercises, which form the main substance of the report. A brief ‘take home’ message at the end can be useful to round the report off. Place the figures and the Table at the end of the report, after a bibliography.</p>