L1025 INTRODUCTION TO STATISTICS 2020
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
L1025 INTRODUCTION TO STATISTICS
Assessment Period: May/June 2020 (A2)
SECTION A
Answer ALL parts [each part is worth 6 marks, 30 marks in total]
1.
A researcher collects data on annual salaries of senior managers of 12 firms in the hospitality industry. The data is shown in the table below, reported in thousands of pounds.
Firm |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
Salary |
48 |
70 |
58 |
28 |
47 |
35 |
52 |
32 |
56 |
49 |
65 |
85 |
a. Calculate the mean, median, mode and standard deviation and comment briefly on what these summary statistics tell you about salaries in this industry.
b. Construct a table showing frequencies and relative frequencies of this data. Use class intervals of 0-30, 31-40, 41-50, 51-60 and 61-85. Which interval has the highest relative frequency? [You may find the table layout below useful, edit it as you see fit]
c. Describe how you would represent the relative frequencies on a chart and what it would look like. [For example, explain what type of graph you would use, what it might look like and how you would use it.]
d. The researcher collects more data and, with her new sample from 100 firms, finds the following summary information:
Salary range |
Gender |
Total |
|
|
Male |
Female |
|
0-30 |
5 |
2 |
7 |
31-40 |
11 |
4 |
15 |
41-50 |
19 |
7 |
26 |
51-60 |
26 |
9 |
35 |
61+ |
14 |
3 |
17 |
Totals |
75 |
25 |
100 |
i. What is the probability that a female manager earns between £51,000 and £60,000?
ii. What is the probability that someone earning over £60,000 is a woman?
iii. What is the probability that someone earning £50,000 or less is a man?
e. A larger sample is collected and the researcher calculates a mean salary of £58,000 and a standard deviation of £14,000.
i. What is the probability that a senior manager earns more than £65,000?
ii. What is the probability that a senior manager earns between £42,000 and £70,000?
SECTION B
Answer TWO questions [each worth 35 marks]
2.
a. Explain what is meant by Type 1 and Type 2 errors in the context of hypothesis testing. Which error do we try to minimise when we test hypotheses and why? [4 marks]
b. A University claims that students spend on average 12 hours a week in the campus Library. Data is collected on a sample of 400 students giving a sample mean and standard deviation of 14 and 4 respectively. Use this data to test the University’s claim, with a significance level of 5%. [5 marks]
c. Show how your answer to part b would change if the sample size was only 10. [5 marks]
d. The sample of 400 students consists of 300 undergraduates and 100 postgraduates. Undergraduates spend on average 13 hours a week , with a standard deviation of 6, and postgraduate students spend on average 17 hours a week, with a standard deviation of 3. Test the claim that postgraduates use the Library more than undergraduates. [8 marks]
e. Among the sample, 100 students live on campus. They spend on average 12 hours a week in the Library, with a standard deviation of 5. Find the 99% Confidence Interval for the population mean. [5 marks]
f. The survey of students shows that 78% of campus-based students would like the Library to increase its opening hours compared to 82% of off-campus students. Test whether these responses are different. Use a 5% significance level. [8 marks]
3.
a. The following charts show scatter plots of pairs of variables. Explain which chart shows evidence of:
i. Positive correlation [2 marks]
ii. Negative correlation [2 marks]
iii. No correlation [2 marks]
b. A junior economist is examining the relationship between the real price of sugary drinks and consumption. They collect quarterly data for two years, as follows
Quarter |
Price (pence per litre) |
Consumption (litres per person) |
1 |
120 |
3 |
2 |
122 |
2.9 |
3 |
124 |
3.1 |
4 |
125 |
2.8 |
5 |
129 |
2.5 |
6 |
130 |
2.5 |
7 |
132 |
2.8 |
8 |
135 |
2.4 |
i. Calculate the Pearson correlation coefficient between price and consumption. Show your working for full marks. What does your answer suggest about the demand curve for sugary drinks? [7 marks]
ii. Test the statistical significance of the correlation coefficient, using a 5% significance level. [7 marks]
c. Using the data in part b), a simple linear regression of the relationship between price and consumption of sugary drinks is estimated and produces the following results:
Consumption = 7.76 – 0.039 Price
i. Interpret the intercept and slope coefficients [2 marks]
ii. Predict the amount of sugary drinks consumed when price is £1.28 [2 marks]
iii. Calculate the price elasticity of demand for sugary drinks and interpret your answer. [5 marks]
iv. Calculate and interpret the goodness of fit, R2, for this regression, and use it to test if the model has any explanatory power. . [6 marks]
4.
You are a junior health economist working for the World Health Organisation and your task is to examine the factors that affect female life expectancy. Using country-level data you estimate the following regression using Ordinary Least Squares:
Life_Expectancy = a + b1GNP_pc + b2Births + b3Civil_War + e
where Life_Expectancy is the number of years a woman born in 2019 may expect to live;
GNP_pc is Gross National Product per capita, measured in $US; Births is the number of births per 10000 women; and Civil_War is a dummy variable which takes values of 1 if the country experienced a civil war in the period 2000-2019, and 0 otherwise.
You obtain the results shown in the Excel table below. Note that some information has been deliberately omitted.
Multiple R |
BLANK |
|
|
|
R Square |
BLANK |
|
|
|
Adjusted R Square |
0.807 |
|
|
|
Standard Error |
4.893 |
|
|
|
Observations |
91 |
|
|
|
ANOVA |
|
|
|
|
|
df |
SS |
MS |
F |
Regression |
3 |
9067.24 |
3022.41 |
126.22 |
Error |
87 |
2083.31 |
23.95 |
|
Total |
90 |
11150.55 |
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Intercept |
84.2767 |
1.8624 |
45.252 |
0.0000 |
GNP pc |
0.0002 |
0.0001 |
BLANK |
0.0259 |
Births |
-0.6506 |
0.0487 |
BLANK |
BLANK |
Civil_War |
-0.5812 |
0.1981 |
-2.933 |
0.0010 |
a. Interpret the coefficients of GNP_pc and Births. [4 marks]
b. Test whether the birth rate has a statistically significant effect on female life expectancy. Use a significance level of 5%. [5 marks]
c. Interpret the P-value for GNP_pc. [5 marks]
d. What is the impact of civil war on female life expectancy? [5 marks]
e. How much of the variation in female life expectancy is explained by the model? [5 marks]
f. Test the null hypothesis that b1=b2=b3=0 [5 marks]
g. Outline briefly how you would attempt to improve this model (no more than 150 words) [6 marks]
2023-08-02