Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

GGES3006/6018: QUANTITATIVE ASSESSMENT I

7 February 2023

INSTRUCTIONS

1. You are expected to answer 11 questions - 10 questions in Section A and one question in Section B depending on whether you are a level 6 or a level 7 student.

2. Each question in Section A is worth 5 marks and those in Section B are worth 50 marks as indicated below the questions.

3. All the data sets needed to answer the questions are provided in the folder named “Quant_assessmentI_datasets” on Blackboard.

4. Where applicable, questions must be answered using either Excel or R for data preparation, and R for data analysis.

5. Also, your answers must include explanatory text, R code and R outputs. Note in particular that inclusion of your R code is essential to gain good marks.

6. When you have finished, save your answer paper as a PDF file and upload it to eAssignment before the deadline of 12 noon on Friday March 10, 2023.

7. Do not forget to include your Student ID number in your answer paper.

SECTION A - SHORT ANSWER QUESTIONS

QUESTION A1

Briefly describe quantitative, qualitative and mixed methods research. Provide examples of Sustainability and Environmental Science research topics that can be investigated using each method. Be as specific as possible in your examples.

[5 marks]

QUESTION A2

Briefly explain the following terms often used in quantitative research (not more than 100 words each):

- Descriptive and inferential statistics

- Parametric and non-parametric statistics

- Causal inference and predictive inference

- Frequentist and Bayesian statistics.

- Probability and non-probability sampling techniques

[5 marks]

QUESTION A3

The data Bird_eggs_data.csv contains information on the contaminant levels of the eggs of a bird species found in different coastal locations. The contaminants measured were PCB, DDE and DDD. The data also includes the thickness (THICK) of the eggshells.

Perform a pipeline of descriptive analysis in R, including measures of central tendency and variability to describe the dataset. Interpret and discuss the results.

[5 marks]

QUESTION A4

The plot below was hurriedly created by a student working on a research project investigating risk factors associated with non- and under-vaccination in a given country.

 

Suggest at least two ways in which the plot can be improved.

[5 marks]

QUESTION A5

The data file CO2_concentrations.txt contains information on annual mean carbon dioxide concentrations measured in a particular city from 1958 to 2007.

(a)  Produce a time series plot of the data to show the trend in CO2 concentration. Show both line and points on your plot and label the plot appropriately.

(b) Calculate the mean CO2 emission during the observation period and show this as a horizontal line on the plot.

(c) During which years were the measurements above the mean value?     [5 marks]

QUESTION A6

A sample of size 50 yielded the following data. Note that the second to the largest number has been removed. This fact will NOT affect your ability to answer the questions below.

68.6 64.7 47.4 56.4 69.3

50.9 48.3 66.7 70.8 50.5

49.9 56.0 59.6 64.0 72.4

87.8 45.2 72.5 38.0 79.2

41.3 68.7 59.6 67.3 76.3

51.9 NA   70.6 50.3 51.0

59.4 61.4 59.4 72.0 42.2

58.8 77.1 50.2 51.9 51.8

47.2 46.7 54.2 46.8 70.7

53.2 37.9 30.2 61.1 43.5

Read the data into R and answer the following questions.

(a) Calculate the mean and the standard deviation of the data. What proportion of the data lies within one standard deviation of the mean?

(b) Test the hypothesis that the mean of the population from which the sample was taken is greater than 40.

[5 marks]

QUESTION A7

Some scientists are studying the weights of a certain species of animals born in a zoo in an East African country. The weights, in kg, of the animals born in the zoo are normally distributed with mean  and variance 290.

A random sample of 14 animals born in the zoo was taken and these had the following weights, to the nearest kg:

23 64 27 20 48 67 59

18 25 58 24 37 52 30

a. Calculate the mean weight, to the nearest kg, of the animals in the sample and construct a 95% confidence interval for .

b. The scientists claim that the mean weight of the animals born in the zoo is 45 kg. Use your answer to (a) to comment on this claim. [5 marks]

QUESTION A8

A fellow student you were working with on a group assignment handed the R code below to you to run, as well as an accompanying dataset.

install.packages("mgcv")

library(mgvc)

pisa <- read.csv("pisasci2006.csv",header = TRUE)

pisa <- na.omit(Pisa)

assess.mod <- lm(Overall ~ Income, data=pisa)

plot(asses.mod, uniform=TRUE, compress=TRUE, branch=0.8)

plot(Overall ~ Income, data = pisa)

abline(assess.mod)

text(income, all=TRUE, digits=4, use.n=TRUE, cxe=0.6, xpd=TRUE)

Rsq <- cor(predict(assess.mod),pisa$Overall)^1/2

Find five mistakes in the code and briefly explain how to correct them.

[5 marks]

QUESTION A9

Using the sim_dat.csv dataset, create a histogram of the flow variable. Also, create histograms of the log and square root transformations of the variable. Label your histograms appropriately. Which of these transformations would you recommend as being more appropriate to improve normality? [5 marks]

QUESTION A10

Using the (better) transformed variable from QUESTION A9 above as the dependent variable (assuming now that it is on its original scale) and the ‘size’ column from the data as the independent variable, fit a simple linear regression model. Write down the fitted model and interpret the estimated regression coefficient.

[5 marks]

SECTION B – LONG ANSWER QUESTIONS

QUESTION B1 (Level 6 students only)

A researcher is interested in analysing the probability of survival of humans attacked by a certain deadly reptile. They have collected some data on such attacks that happened in the past. The variables included in the data are: Survival (1 – if the person survived; 0 – if the person did not survive), Dweight – the difference in body weight between the animal and the victim, Assistance – whether they received any assistance during the attack or not, and Position – the location of the attack. The data are available in the file “Animal_attacks.txt”.

Perform a logistic regression analysis to determine the significant predictors of the odds of survival. Write a short report to explain your analysis, provide your R code and give the researcher feedback on what needs to be done to prevent future attacks or improve the survival of attack victims.    

What needs to change in this analysis if one wants to use linear regression?

[50 marks]

QUESTION B2 (Level 7 students only)

In order to understand global patterns in plant height, a group of scientists assembled data on the maximum heights of different plant species from different parts of the world. They also obtained data on ten environmental variables from the locations where the measurements were taken. They are interested in knowing whether plant height can be reasonably predicted using a subset of these variables. If this is the case, then less effort would be expended on data collection to predict or understand patterns in plant height.

The file Plant_height.csv provides the data and contains the following variables: location - site number, height - plant height, loght - log-transformed plant height, alt – altitude, temp - annual mean temperature, temp.seas - temperature seasonality, temp.mean.wetqr - mean temperature of the wettest quarter, isotherm – isothermality, rain - annual precipitation, rain.seas - precipitation seasonality, rain.dryqr - precipitation of the driest quarter, LAI - leaf area index and NPP - net primary productivity.

Before your analysis, log-transform altitude and temperature using the formulae:  and  to miminize the influence of high-altitude sites and very low temperatures. Also note that the dependent variable, height, has been log-transformed and you should use it in this form.

Carry out an appropriate analysis to determine whether plant height can be predicted from the ten variables provided. Write a short report which must explain and justify your modelling approach, provide your R code, and give feedback to the scientists on the value of their variables. They would be interested to know whether they can use fewer variables to predict plant height and whether any of the ten variables can be discarded as uninformative.     [50 marks]