Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SOC2093 Advanced Quantitative Methods 2021/22

Assignment 2

Due: 16th January 2022 by 16:00

This exercise comprises two parts that is worth 70% of the module mark. This is based on the World Development Indicators (2013) (WDI2013.sav) data that we have been using in some of the workshops and seminars. You will be able to complete this part of the assignment using the material from weeks 5-9 of the course. When discussing significance you may also want to make reference to material from week 4.

WORLD DEVELOPIMENT INDICATORS analysis

We have used WDI2013 data in some of the workshops and classes, but please feel free to consult the following sources of information, if you wish:

https://datacatalog.worldbank.org/dataset/world-development-indicators

https://ukdataservice.ac.uk/use-data/guides/dataset/development-indicators.aspx

Most of the variables on the WDI dataset are ‘continuous’ variables, which means we can run correlations and linear/multiple regressions to explore the relationships between the variables, just as we have been doing in class. This part of the assignment requires you to answer some questions about correlations and regressions, explain how you would run a multiple regression and associated diagnostics, and to describe and interpret some outputs from SPSS based on the WDI variables. The final question asks you to construct and interpret your own regression model.

Part 1: Total 50 marks

1. What is the difference between a correlation and a regression? What does a regression analysis enable us to understand/conclude that a correlation does not?

2. When might we want to run a multiple regression? What is the difference between running a multiple regression with two independent variables and two separate linear regressions?

3. Of the following variables, which would most plausibly be the independent variables and which would be the dependent variable in a multiple regression? What impact do we think the independent variables may be having on the dependent variable and why?

• Percentage of the labour force with intermediate (secondary) education [lfeduc]

• GDP, GDP per capita (constant 1995 US$) [gdp]

• Percentage of the labour force that is female [lffem]

4. Continuing with the data outlined in (3), produce a scatterplots (or scatterplot matrix) and describe the relationship between the variables.

5. Based on your assessment of the scatterplots, would we want to do anything with any of our variables? (e.g. transforming the data) Outline the main ‘assumptions’ of a multiple regression and explain how your adjustments may relate to them.

6. We want to understand the extent to which Government expenditure on education (speneduc) and the poverty level (measured as the % of the population earning less than $1 a day) (doladay) is related to adult literacy rates (litracy). Produce a multiple regression in R to look at the relationship between these variables. Include the R code and model results here.

7. Based on the output from your model, construct the regression equation and write in a sentence what the regression equation tells us. Having done this, comment on whether each independent variable is a significant predictor of the dependent variable and what we may expect to find in terms of the estimated impact of each of the independent variables on the dependent variable. Remember to also discuss the R2 value.

8. Check the assumptions of the model. Include the diagnostic plots and provide some general discussion of the extent that the main assumptions have been met.

9. You are interested in understanding whether ‘continent’, a qualitative variable, also plays a role in the relationship discussed in (6). You would like to know whether this relationship differs by whether the country is in Africa or not. Extend your regression model by adding in this variable and comment on the new results.

10. Take another look at the model assumptions and comment on any changes.

Part 2: Total 20 marks

11. Finally, estimate a new multiple regression model to explain variations in contraceptive use (variable is contra) across countries. You must include at least 3 independent variables (quantitative or qualitative) from the dataset in your regression model, but the choice of which ones is up to you. Make any transformations to the data you feel are necessary and clearly explain why you have made them in your answer. The answer should include a clear interpretation of the main results from your model, including an indication of the extent that the main assumptions of a linear regression have been met. You may want to use your answers to questions 3-10 to help. A typical word-length for this question would be around 250 words.