Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 4

Multiple Regression with Interactions

Due on 4/10/2025

Background of the Data Set

Human overpopulation is a growing concern and has been associated with depletion of Earth’s natural resources and degradation of the environment. This, in turn, has social and economic consequences such as global tension over resources such as water and food, higher cost of living and higher unemployment rates. The data in `fertility.sav` were collected from several sources (e.g., World Bank) and are thought to correlate with fertility rates, a measure directly linked to population. The variables are:

· country: Country name

· region: Region of the world

· fertility_rate: Average number of children that would be born to a woman if she were to live to the end of her childbearing years and bear children in accordance with age-specific fertility rates.

· educ_female: Average number of years of formal education (schooling) for females

· infant_mortality: Number of infants dying before reaching one year of age, per 1,000 live births in a given year.

· contraceptive: Percentage of women who are practicing, or whose sexual partners are practicing, any form of contraception. It is usually measured for women ages 15–49 who are married or in union.

· gni_class: Categorization based on country’s gross national income per capita (calculated using the World Bank Atlas method)

- Low: Low-income economies; GNI per capita of $1,025 or less;

- Low/Middle: Lower-middle-income economies; GNI per capita between $1,026 and $3,995;

- Upper/Middle: Upper middle-income economies; GNI per capita between $3,996 and $12,375;

- Upper: High-income economies; GNI per capita of $12,376 or more.

· high_gni: Dummy variable indicating if the country is has an upper-middle or high income economy (low- or low/middle-income = 0; upper/middle or upper income = 1)

In this homework, we would like to understand whether formal education for females (educ_female) and country’s gross national income per capita (gni_class) have significant effects on fertility rate (fertility_rate). Specifically, we want to construct three models:

Model 1: continuous explanatory covariate only (educ_female)

Model 2: continuous and categorical explanatory covariates (educ_female + gni_class)

Model 3: continuous, categorical explanatory covariates, and their interactions (educ_female + gni_class + educ_female*gni_class)

Section 1: Canvas Online Submission

In this section, we will focus on testing the three models and selecting the best model.

1. The following is the ANOVA output of the three models:

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

140.906

1

140.906

180.293

<.001b

Residual

95.348

122

.782

Total

236.254

123

2

Regression

163.237

3

54.412

89.425

<.001c

Residual

73.016

120

.608

Total

236.254

123

3

Regression

165.757

5

33.151

55.489

<.001d

Residual

70.497

118

.597

Total

236.254

123

a. Dependent Variable: fertility_rate

b. Predictors: (Constant), educ_female

c. Predictors: (Constant), educ_female, gni_3class=Middle, gni_3class=Upper

d. Predictors: (Constant), educ_female, gni_3class=Middle, gni_3class=Upper, edu_gni_mid, edu_gni_upper

Based on the output, which of the following interpretations is correct?

a) All the three models are significant because all the p-values are less than 0.05.

b) Model 3 is the best because it is more complex than the other models.

c) Model 1 is the best because it only contains a continuous covariate.

2. To test whether the interaction between `educ_female` and `gni_3class` has a significant effect, the null hypothesis is

a) H0: δ1 = 0

b) H0: δ2 = 0

c) H0: δ1 = δ2 = 0

d) H0: γ1 = γ2 = δ1 = δ2 = 0

3. The following is the Model Summary output:

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Change Statistics

R Square Change

F Change

df1

df2

Sig. F Change

1

.772a

.596

.593

.88405

.596

180.293

1

122

<.001

2

.831b

.691

.683

.78005

.095

18.350

2

120

<.001

3

.838c

.702

.689

.77294

.011

2.108

2

118

.126

a. Predictors: (Constant), educ_female

b. Predictors: (Constant), educ_female, gni_3class=Middle, gni_3class=Upper

c. Predictors: (Constant), educ_female, gni_3class=Middle, gni_3class=Upper, edu_gni_mid, edu_gni_upper

Which of the following statements is correct based on the output?

a) It cannot be concluded that Model 2 is significantly better than Model 1.

b) The best model is Model 2 because Model 3 is not significantly better than Model 2 but Model 2 is significantly better than Model 1.

c) The best model is Model 3 because it has the largest R2.

4. Which of the following conclusions about the test result shown in the last question is correct?

a) The model with interaction terms significantly improves the model fit.

b) It is necessary to model the three groups of `gni_3class` with different slopes.

c) It is necessary to model the three groups of `gni_3class` with different intercepts.

d) It is not necessary to form separate regression models for different groups of `gni_3class`.

Section 2 : SPSS (Upload)

In this section, we only focus on Model 3. For each of the following questions, report the relevant output and answer the questions.

Recall that Model 3 includes a continuous variable (`educ_female`), a categorical variable (`gni_class`), and their interaction.

1. Before we analyze the data, we would like to reduce the number of levels for `gni_class` from 4 to 3. Specifically, we would like to merge the levels, Low/Middle and Upper/Middle, to a new level called “Middle”. Please name the new variable as `gni_3class`. Take a screenshot of your updated data set after the new variable is created and paste it here.

2. Next, create the dummy variables for `gni_3class` and paste a screenshot here.

3. Finally, create the interaction terms for `educ_female*gni_3class` and paste a screenshot here.

Let’s set `gni_3class = Low` as the reference group. Model 3 can then be written as

4. Fit the data with Model 3 and write down the estimated model.

5. Write the individual model for each of the three `gni_3class` groups.

6. Interpret the following estimates:

a. The estimated intercept

b. The estimated slope associated with `educ_female`,

c. The estimated slope associated with `gni_3class = Middle`

d. The estimated slope associated with `educ_female* gni_3class = Upper`.

7. Test the significance of δ1. State the hypotheses, report the test statistic and the p-value, and interpret the result.

8. Conduct the omnibus test. State the hypotheses, report the test statistic and the p-value, and interpret the result.