Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


SOC302: Statistics for Social Research

Final Data Analysis Project


1. Description

In this data analysis project, you are given a small dataset drawn from the General Social Survey (GSS). In this dataset, there are 1,376 respondents and 21 variables. The dataset contains variables about the respondents’ basic demographic characteristics (e.g. sex, race, age), socioeconomic status variables (education, socioeconomic index), well-being variables (e.g. quality of life, physical health), and attitudes (whether our country is spending too much/little on childcare assistance).

***You can discuss with your classmates as you work on this project, but you need to write and submit individual project reports and R codes. ***


2. Project Steps

You will need to go through the following steps to complete your final project:

STEP (1). Make sure that you download three files related to this project: (1) this document; (2) a .csv data file; (3) a codebook

STEP (2). Next, you should start by going through the codebook to see what variables this dataset contains there and how they are measured.

STEP (3). You should make sure you know how to load the .csv data in R.

STEP (4). You now pick your key dependent variable of interest (Y). Note that this variable should be a numeric variable. In this project, you will examine variations of this variable and how it depends on other variables.

STEP (5). Complete the following components of your project report. Put all of these components into one single document.

(a) Plot a histogram of Y. Report the sample mean and sample standard deviation of Y.

(b) Construct a 95% confidence interval of Y.

(c) Conduct a hypothesis test of the difference in the mean of Y between two groups. For example, the two groups can be men and women, or whites and blacks. Use 5% as the significance level.

(d) What factor may determine Y? Estimate a binary regression model to predict Y using a numeric independent variable of your choice. Interpret your results (slope and R-squared).

(e) Plot a scatterplot of X and Y, and include the line for the prediction equation in the plot.

(f) What other factors may determine Y? Think of at least three independent variables and use 2-3 sentences to explain why these variables may affect Y. Then, estimate a multiple regression model to predict Y using these independent variables. Interpret your results (coefficients and R-squared).

(g) Write one short paragraph (3-5 sentences) to discuss the broader implications of your findings in (c)-(f) above for sociology.

STEP (6). You will need to submit the following two things to your TA via email ([email protected]):

1. Your final project report.

2. The complete R script (.R file) that contains all the codes that you used to conduct the analyses.