Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

L1025 INTRODUCTION TO STATISTICS

Assessment Period: (A2)

SECTION A

Answer ALL parts [each part carries 5 marks, 30 marks in total]

1.

A researcher collects data on weekly incomes of 10 families in Brighton. The data is shown in the table below, reported in pounds (£).

Table 1 Weekly Incomes

Family

1

2

3

4

5

6

7

8

9

10

Income

225

625

350

250

325

320

400

420

375

450

a. Calculate the mean, median and mode of income and comment on your results.

b. Calculate the standard deviation and the coefficient of variation and comment on your answers.

c. What happens to inequality if the family with an income of £625 increases their income to £925?.

d. Use the data in Table 1 to draw an appropriate graph and comment on what the graph shows.

e. Table 2 below shows responses by gender of a sample of 900 adults in the UK who were asked whether or not they buy a newspaper at the weekend.

Table 2 Gender and Education

 

Men

Women

Newspaper

20

50

No Newspaper

380

450

What is the probability that

i) a newspaper buyer is male? [1 MARK]

ii) a woman buys a newspaper? [1 MARK]

iii) a person selected at random is female and buys a newspaper [1 MARK]

iv) a person selected at random is either male or buys a newspaper? [2 MARKS]

f. Historically the average mark of students on a test is 48% with a standard deviation of 8.  

i. What is the probability that the mark for a randomly selected student is over 70%? 

ii. What is the probability that the mark for a randomly selected person is between 40% and 60%?

SECTION B

Answer TWO questions [each question is worth 35 marks]

2.

a. Imagine you work for a Consumer Rights organisation. A well-known manufacturer of chocolate bars advertises that each bar of chocolate they produce weighs at least 200 grams. You test a sample of 625 chocolate bars and find that the mean weight is 198 grams with a standard deviation of 4 grams. On the basis of this evidence, would you prosecute the manufacturer for a breach of advertising standards? [5 marks]

b. Explain how and why your decision in part a would change if your sample was only 9 chocolate bars. [4 marks]

c. Based on data from the 2011 Population Census, the UK government reports that 14 million households had access to broadband at home out of a total of 42 million households. In 2020, you run a survey of 1000 households which finds that 500 of them have broadband at home. Is this sufficient evidence to conclude that the proportion of households with access to broadband has increased over time?   [4 marks]

d. Calculate the 99% confidence interval for the proportion of households with broadband access in 2020, using the data from part c. Remember to interpret your answer. [4 marks]

e. The sample in part c contains 500 households from England, 345 of which have broadband access, and 100 households from Scotland, 45 of whom have broadband access. Use a 95% significance level to test if the proportion of households with broadband access  is higher in England than it is in Scotland. [5 marks]

f. Table 3 shows data on the distribution of firms across three sectors by firm size (in terms of number of employees).

Table 3. Firm Distribution by Sector

 

Sector

Total number of firms

Firm size

Agriculture

Services

Manufacture

 

Small

270

108

90

468

Medium

91

42

39

172

Large

40

20

23

83

Total

401

170

152

723

Is there evidence of an association between firm size and sector? [6 marks]

3. Use the data in the Excel spreadsheet “crime.xls”. The spreadsheet contains data on 42 Local Police Force Areas in England, as described in Table 4:

Table 4: Description of Crime data

BR

Burglary rate per 1000 people in the Police Force Area Population.

CR

The conviction rate, i.e. the percentage of all crimes solved in the Police Force Area.

SEN

The average sentence length (in months) dispensed by the judiciary in the Police Force Area.

UR

The male unemployment rate in the Police Force Area.

HO

The percentage of households with three rooms or less in the Police Force Area.  

Your task is to analyse the factors associated with burglaries. The literature suggests that, generally, levels of crime are affected by efforts to deter crime  and by social and economic deprivation. Here we measure crime by the burglary rate, BR; deterrence by the conviction rate , CR, and by the average sentence length, SEN; and social deprivation is measured by the male unemployment rate, UR, and housing overcrowding, HO.

a. Explain in your own words whether there is likely to be a positive, negative or no correlation between the burglary rate and each of the other variables. [6 marks]

b. Use Excel to draw a scatter plot of the burglary rate, BR, against ONE other variable. Copy and paste the scatter plot in to your answer file and comment on whether there is any evidence of a correlation between BR and the variable you have chosen. [4 marks]

c. Calculate the correlation coefficient between the burglary rate and the variable you have chosen in part b, remembering to interpret and comment on your answer. [4 marks]

d. Using a 5% significance level, test if the correlation coefficient obtained in part c is statistically significant. [4 marks]

e. Use Excel to estimate a simple linear regression of the relationship between burglary rate and your chosen variable from part b, i.e.

BR  = a + b X + e, where X stands for your chosen variable from part b 

i. Copy and paste your Excel output in to your answer doc. Ensure it is neatly presented. Interpret the intercept and slope coefficient and comment on whether either are statistically significant. [4 marks]

ii. Interpret the goodness of fit for your regression, and use it to test if the model has any explanatory power.   [4 marks]

iii. Do you think your regression is useful for understanding the factors related to crime? Explain what you would do to improve it. 150 WORDS MAX [7 marks]

4. You are an assistant economist in the government and you wish to analyse the determinants of wages. Using data on a sample of 935 workers you estimate the following regression using Ordinary Least Squares:

wage = a + b1hoursi + b2educi + b3experi + b4tenurei + b4agei + ei

where wage is monthly earnings measured in £; hours is the number of hours worked weekly; educ is the number of years of education; exper is the total number of years worked; tenure is the number of years working with the current employer; and age is the worker’s age measured in years.

You obtain the results shown in the Excel table below. Note that some information has been deliberately omitted.

R Square

0.151

 

 

 

 

 

Adjusted R Square

0.147

 

 

 

 

 

Observations

935

 

 

 

 

 

ANOVA 

df

SS

MS

F

Significance F

 

Regression

5

23109547.22

4621909.44

33.129

BLANK

 

Residual (Error)

929

129606621.00

139511.97

 

 

 

Total

934

152716168.22

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

-439.165

165.154

-2.659

0.008

-763.284

-115.046

Hours

-1.936

1.704

-1.137

0.256

 

-5.280

1.407

Educ

71.104

BLANK

BLANK

0.000

BLANK

BLANK

exper

10.496

BLANK

BLANK

BLANK

3.049

17.943

tenure

7.217

2.529

2.853

BLANK

BLANK

BLANK

Age

10.609

4.779

BLANK

BLANK

BLANK

BLANK

a. Interpret the coefficients of any THREE independent variables and comment on whether there is evidence that any of these variables have an effect on wages. [6 marks]

b. How much of the variation in wages is explained by the model? Test the null hypothesis that b1=b2=b3= b4= b5=0. [5 marks]

c. Use the data in the Excel spreadsheet “wage.xls” to answer this final part. The wage.xls file contains data on 935 workers, their wage and the independent variables hours, educ, exper and age, as defined above. In addition there is a binary variable female, which is equal to 1 if the worker is female, 0 otherwise.

i. Estimate a multiple regression model that allows you to explore the gender pay gap. Copy and paste your output into your answer document and comment on what your results say about wage differences between male and female workers. [8 marks] 

ii. Re-estimate the model in part c i but replacing wage with its natural logarithm. Discuss your results. [8 marks]

iii. Describe how you might improve your model. (150 WORDS MAX) [8 marks]