STAT1070 Assignment 2 Semester 1, 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT1070 Assignment 2
Semester 1, 2022
Question 1. [16 marks]
A garbage collection division of a local council is interested in their garbage statistics. They contract you as a data analyst to analyse their collection processes. The first task is to assess the performance of two collection teams to determine which team is faster at completing their collection runs.
The council records the time it takes for two of their teams to complete the collection runs for each suburb and records these in a data base. A random sample of the data base is shown below with the collection times for the respective suburb.
Suburb Team 1 Team 2
A 87 68
B C D E F G H I J
44
106
57
93
104
90
28
55
32
32
92
59
90
82
69
37
54
23
(a) [1 mark] Are the Team 1 and Team 2 samples paired or independent? Write a sentence
justifying your choice. The mark will be allocated for your justification.
(b) [2 marks] Enter the data into jamovi and provide output showing the means and standard
deviations for the two teams. Which team appears to be faster on average?
Hint: The layout of the data in jamovi will be different depending on whether you think the data is paired or independent. Be consistent with your answer in part (a).
(c) [6 marks] Carry out an appropriate hypothesis test to investigate whether there is a difference between the two teams collection times. Be sure to state the null and alternative hypotheses, the observed test statistic, the null distribution, the p-value, and an appropriate conclusion in plain language.
(d) [3 marks] Report the 95% confidence interval for the difference in average collection times between the two teams. Write a sentence interpreting this interval in plain language. Does this confidence interval support the decision made in part (c)?
(e) [4 marks] State and check the assumptions that are necessary for the analysis in parts (c)
and (d).
Question 2. [16 marks]
Researchers at the University of Wisconsin-Madison are interested in porosity and density measure- ments of different types of rocks. Details of how their measurements are taken can be be found at https://wgnhs.wisc.edu/maps-data/data/rock-properties/understanding-porosity-density/.
The data set geology-data.omv has been downloaded from the above website and is also available
on Canvas. The data set shows several variables related to core samples taken throughout Wisconsin’s aquifers and aquitards.
(a) [3 marks] Provide an appropriate plot and summary statistics to investigate if there is a
difference in the average Porosity between the three types of rock (Lithology) in the data set. Compare the location of each distribution of porosity and comment on whether you think there is a diffence in the population means.
(b) [6 marks] Perform an appropriate test to see if there is a significant difference in porosity
across the three rock types at the five percent significance level. Be sure to state the null and alternative hypotheses, the observed test statistic, the null distribution, the p-value, and an appropriate conclusion in plain language.
(c) [3 marks] If required, perform post-hoc tests to determine which rock types have significantly different mean porosity values. Support your conclusions about each comparison. If post-hoc tests are not required, explain why not.
(d) [4 marks] State and check the assumptions that are necessary for the analysis in part (b).
Question 3. [24 marks]
Another group of researchers at The University of Wisconsin-Madison are specifically interested in sandstone and whether its porosity varies with depth. To facilitate their analysis they modified the data set from Question 2 to only include observations of sandstone. They also converted the depth variable from feet to kilometres.
This new data set is sandstone.omv.
(a) [4 marks] Generate a scatterplot to investigate the relationship between the depth and porosity.
Briefly describe the relationship.
(b) [3 marks] Write down the equation for the estimated regression line and provide an interpre-
tation of the intercept and the slope coefficient.
(c) [1 mark] Use the equation of the regression model to predict the porosity of a sandstone sample taken from depth of 256m.
(d) [2 marks] The 8th data point in the data set are measurements at a depth of 256m. Calculate the difference between the observed porosity at 256m and the predicted value you found in part
(c). What is the name used in regression to describe what you just calculated?
(e) [6 marks] Is there a statistically significant linear relationship between porosity level and the
depth the core sample was taken from? Be sure to state the null and alternative hypotheses, test statistic, null distribution, p-value, decision and an appropriate conclusion in plain language.
(f) [4 marks] State the assumptions necessary for your regression analysis in part (d) to be
appropriate. State whether each of them is satisfied with a brief justification. This justification may refer to appropriate output from jamovi.
(g) [2 marks] Provide a 95% confidence interval for the slope of the population regression line of
porosity level on depth. Write an interpretation of this interval.
(h) [2 marks] Write down the R2 value for this regression and give an interpretation.
2022-05-11