Analysis of Hox’s Popularity Data Using Stata
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Exercise 1: Analysis of Hox’s Popularity Data Using Stata
Soc. 2960S
[Please submit a document that includes your output and a brief description of your findings in each step. We will go over the assignment in class. You are welcome to work with other students, but the work you submit should be your own. If you use AI in your coding and writing, please say how you did.]
Use Hox’s “popular2” data set. This is simulated data in which the lowest level of observation (level-1) is the pupil level, and the highest level of observation (level-2) is the school level.
The “popular2” data are available as a Stata file on Canvas.
The first step is to find out what is in popular2.dta. For this purpose, use the -codebook- command.
Let Y = popularity, X = gender. Create a file of bivariate within-school regression lines. Do this in the following way. Using OLS, compute the school-specific bivariate regressions of Y on X using the -statsby- command. Save the results to a new file. This will be a school-level file (call it “schooldat1.dta”). For this step, take advantage of -preserve- and -restore-. You will need the original data set for the next step.
Using the original data, create a school-level file that contains three variables: (i) the within-school means of Y, (ii) the within-school means of X, and (iii) Z = teacher experience. Your best bet is to use -collapse- for this step. Once you have created the file (call it “schooldat2.dta”), merge it with schooldat1.dta, renaming the result “schooldat.dta”. You now have level-1 and level-2 data files.[1]
Now we’ll use Stata’s -xt- commands with popular.dta. You’ll fit a series of models. Bear in mind that the models I ask you to fit are not necessarily presented in the order you might use in a specific analysis of real data. Here, I am merely providing an opportunity for you to explore possibilities with Stata.
Regress Y on nothing (that is, the right hand side will contain only the intercept), disregarding clustering.
Regress Y on X disregarding clustering.
Regress Y on nothing. Do this three ways, using three options in -xtreg-: (i) GLS random intercept; (ii) maximum likelihood random intercept; and (iii) mixed. Compare your output across the three sets of results in this step.
Regress Y on X using the options indicated in Step 6. Make all appropriate comparisons within this step.
Regress Y on X and Z using the options indicated in Step 6. Do all three options produce results? Make all relevant comparisons.
Regress Y on X, Z, and XZ using the options used in previous steps. This time also specify a random slope for Z. Note any differences within and between steps (pay attention to coefficients, rho, etc).
Thus far you have estimated models that include a cross-level interaction. The only direct assessment of the interaction has been through the use of coefficients and standard errors, but it is also helpful to examine graphs. One graphing strategy in multilevel analysis is to plot within-context intercepts and coefficients against contextual variables. We’ll do that with the popularity data.
Using schooldat.dta, plot the OLS intercepts against Z. Also plot the OLS slopes against Z. Try using OLS to regress the intercepts against Z, and separately, the slopes against Z. Does linearity suffice? How would you check? Consider applying a lowess smoother (see -lowess-).
Thus far we have not dealt explicitly with assumptions about the errors. What is there to do? In the hierarchical model we assume normality; hence we would want to check that. We might also check the within-school homogeneity assumption. Do boys and girls have the same error variance? In addition, we might want to check whether the error variance is constant across schools (between-school homogeneity).
Check whether there is heterogeneity by sex, once school is taken into account and differential sex-contrasts across schools have been allowed for. Using popular.dta, regress Y on X, a dummy variable classification for school, and the saturated interaction of X with the dummy variable classification. Obtain the residuals for this regression. Use these residuals to construct box plots for boys and girls. You can do this with:
graph box resid, over(sex)
where “resid” is the name of the variable containing the residuals. What do you find?
To check the normality assumption, it is helpful to construct a residual quantile-normal plot.
The quantile-normal plot emphasizes the center of the distribution. Try this plot.
[1] This is not a necessary step for many parts of this assignment. However, this is a routine step in creating files for use in HLM, so those of you planning to use HLM should get comfortable with it.
2026-02-03