Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Empirical Project

Do Smaller Classes Improve Test Scores? Evidence from Experimental and Quasiexperimental Designs

Due at midnight am on Thursday, December 10, 2023

Please submit your Empirical Project on Canvas. Your submission should include three files:

1.         A 10- 15 page replication as a word or pdf document (including references, graphs, and tables)

2.         A do-file with your STATA code

3.         A log file of your STATA output

In this empirical project, we will examine both experimental and non-experimental evidence regarding the impact of class size on test scores.

Part 1

Start with caschool.dta, the California Test Score Dataset. The data were collected from all 420 K- 6 and K-8 districts in California for 1998 and 1999. Test scores represent the average reading and   math scores of 5th-grade students on the Stanford 9 standardized test. School characteristics, including enrolment, teachers (measured as full-time equivalents), computers per classroom, and expenditures per student, were averaged across the district. The student-teacher ratio was calculated using the number of teachers and students.

(a) Explain why evaluating the effect of class size on test scores has important policy implications.

(b) Create a dummy variable small for school districts with average student-teacher ratio (str) below 18, regress average test score (testscr) on small. Explain why the estimated coefficient of small would not measure the causal effect of class size. Would this simple comparison likely be biased upwards or biased downwards relative to that true causal effect?  Explain.

(c) Rerun (b), adding the average district income (avginc) and expenditure pre student (expn_stu), explain the change in the estimated coefficient of small. Briefly discuss why controlling these two variables helps address the omitted variable bias.

(d) Rerun (b), adding county fixed effects. Interpret the results.

(e) Another way to control for the omitted variable bias associated with observables such as  income and expenditure per student is to create a matched control sample based on these  two variables. Employ the method of Coarsened Exact Matching (CEM), first summarize avginc and expn_stu, find a suitable way to cut the sample into strata for coarsened matching, and check the quality of matching in a nicely formatted balance table below. Discuss the pros and cons of increasing the number of strata in CEM matching.

(Hint: You might follow the codes from Table 2 and type help cem in Stata for details.)

 

Treated (small = 1)

Control (small =0)

 

 

Variables

n

mean

sd

n

mean

sd

Diff in

SE of

 

 

 

 

 

 

 

mean

Diff

 

 

 

 

 

 

 

 

 

c

Treated (small = 1 &

cem_matched = 1)

Control (small =0 &

cem_matched = 1)

 

 

Variables

n

mean

sd

n

mean

sd

Diff in

mean

SE of

Diff

 

 

 

 

 

 

 

 

 

(f) Rerun (b)-(d) in the matched sample. Interpret the results.

Part 2: The STAR Experiment

For experimental evidence, we will examine data from the Tennessee class size reduction experiment, known as Project STAR (Student–Teacher Achievement Ratio). You might also need to refer to the following paper discussed in class:

Chetty, Raj, John N. Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzenbach, and Danny Yagan. 2011. “How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from  Project STAR,” Quarterly Journal of Economics 126(4): 1593– 1660.

STAR project was a 4-year experiment designed to evaluate the effect on learning of small class sizes. The study compared three different class arrangements for kindergarten through third grade: a regular-sized class, with 22 to 25 students per class, a single teacher, and no teacher’s aide; a small class, with 13 to 17 students per class and no teacher’s aide; and a regular-sized class with a teacher’s aide.

Each school participating in the experiment had at least one class of each type, and students entering kindergarten in a participating school were randomly assigned to one of these three groups at the beginning of the 1985– 1986 academic year. Teachers were also assigned randomly to one of  the three types of classes.

According to the original experimental protocol, students would stay in their initially assigned class type for the 4 years of the experiment (kindergarten through third grade). However, because of parent complaints, students initially assigned to a regular class (with or without an aide) were randomly reassigned at the beginning of first grade to a regular class with an aide or to a regular class without an aide; most students initially assigned to a small class remained in a small class but  some were re-randomized into regular or regular+aid groups. Students entering school in first grade (kindergarten was optional), in the second year of the experiment, were randomly assigned to one of the three groups. Each year students in the experiment were given standardized tests (the Stanford Achievement Test) in reading and math.

The Project STAR public access data set contains data on test scores, treatment group assignment  in each school year (kindergarten, grade 1, 2 and 3), and student and teacher characteristics for the four years of the experiment, from academic year 1985-86 to academic year 1988-89. You could access the data by typing "use http://fmwww.bc.edu/ec-p/data/stockwatson/webstar.dta,clear" in Stata. Please also checkhttps://fmwww.bc.edu/ec-p/data/stockwatson/star_sw.desfor further details of the dataset.

(a) Use the potential outcome framework to discuss what are the selection bias and heterogeneity bias in observational studies and why randomization of treatment status help eliminate both biases.

(b) There are two branches of treatment: small class and regular+aid.

Provide evidence that the education authorities really did randomly assign students to treatment and control groups in kindergarten. Please create a nicely formatted table (check an example below) that reports the means of several relevant characteristics for students in the treatment and control groups.

Hint: restrict your analysis to students who participated in their kindergarten year only (stark==1)

 

Small

Regular

 

 

Variables

n

mean

sd

n

mean

sd

Diff in

mean

SE of

Diff

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Regular+Aid

Regular

 

 

Variables

n

mean

sd

n

mean

sd

Diff in

mean

SE of

Diff

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(c) For each of the variables you summarized above, calculate: (i) the difference between the mean in the treatment group and the mean in the control group; (ii) the standard error for the difference in means. Add these the final column to the table you started in question (a).

(d) Another way to check for the quality of randomization is to run a one-way ANOVA test on the equality of key covariates across three groups (small class, regular+aid, regular).  Pick one variable from (e) and conduct a complete hypothesis test.

(e) We define our first dependent variable as each student's combined score on the math and reading portions of the Stanford Achievement Test, taken in the kindergarten year (treadssk+ tmathssk). Create this variable and present it in a histogram. Then run the regression: Yi β0 + β1SmallClassi +β2RegularAidi + ui, printout the result from Stata and interpret the estimated coefficients.

(f)  You are thinking about whether to add additional controls to the regression. What are the pros and cons of adding more controls? What control variables would you pick in this case? Show the Stata results and interpret.

(g) Is there evidence of heterogeneous treatment effects between genders? Interpret the results.

(h) In randomized trials, Attrition refers to subjects dropping out of the study after being randomly assigned to the treatment or the control group. (Read Stock and Watson Ch. 13.2 for more details). In this case, attrition could be thought as students who were randomly assigned into treatment/control groups but did not finish (both math and reading) tests in the kindergarten year. Calculate the attrition rate in this case. Discuss in general why attrition might pose a threat to identification in experiments. In this experiment, do you think attrition would be harmful for the identification of the causal effects of class size?  (Hint:  you could compare if attrition rates differ by treatment/control status and differ along key dimension of student or class characteristics). For simplicity, we could ignore the regular+aid treatment group and focus on the comparison between small class (treatment) and regular class (control).

(i)  You would like to examine whether or not being assigned to a smaller class in Kindergarten has long-lasting effects on test scores in later grades. One idea is to regress grade 3 scores on kindergarten treatment.  Assignment into small classes in the Kindergarten year is correlated with assignment into small classes in later years. But there still are a small number of students initially assigned into smaller classes re-randomized in later years. Try   (1) Regress grade 3 score on kindergarten treatment, controlling for treatment status in later years; (2) Restrict your analysis to a sample of students who have not been assigned to smaller classes in grade 1, 2 or 3 and evaluate the impact of being assigned to small classes in the kindergarten year on grade 3 test scores. Are the two sets of results consistent? What can you conclude on the long-lasting effects of kindergarten assignment into small class on future grades?

(j)  Chettyet al. (2011) show that being assigned to a smaller class in Kindergarten raises Kindergarten test scores, but has little impact on later grades. Does this “fade out” effect mean that class size doesn’t really matter in the long run? Why or why not?

Part 3

In this part, you will use a regression discontinuity design to estimate the causal effect of class size on test scores. To answer some of the questions, you might need to refer to the following  paper:

Angrist, Joshua D., and Victor Lavy. 1999. “Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement,” Quarterly Journal of Economics 114(2): 533–575.

The Stata data file grade5.dta consists of test scores in fifth grade classes at public elementary schools in Israel. These data were originally used in Angrist and Lavy (1999). The graphs below were drawn using the same data.

Figure 1

Class Size as a Function of Total School Enrollment in Public Schools in Israel

 

Note: These figures plot class size as a function of total school enrollment for fourth grade and fifth grade classes in public schools in Israel in 1991.

1.   What is a binned scatter plot? Explain how it is constructed. (You might follow the codes from Table 2 and type help binscatter in Stata for details.)

2.   Graphical regression discontinuity analysis, focusing on the 40 student school enrollment threshold. See Table 2 for more guidance.

a.   Draw a binned scatter plot to visualize how class size changes at the 40 student school enrollment threshold. Display a linear or quadratic regression line based on what you see in the data.

b.   Draw binned scatter plots to visualize how math and verbal test scores change at the 40 student school enrollment threshold. Display a linear or quadratic regression line based on what you see in the data.

c.   Draw binned scatter plots to test whether (i) the percent of disadvantaged students, (ii) the fraction of religious schools, and (iii) the fraction of female students evolve smoothly across the 40 student school enrollment threshold. Display a linear or quadratic regression line based on what you see in the data. What is the purpose of this analysis?

d.   Produce a histogram of the number of schools by total school enrollment. What is the purpose of this analysis? Note that you must collapse the data by school to produce this graph.

3.   Regression analysis. Run the regressions that correspond to your three graphs in 4a and 4b to quantify the discontinuities that you see in the data. Try In estimating these regressions, use all the observations with school enrollment less than 80. Report a 95% confidence interval for each of these estimates. See Table 2 for more guidance.

4.   Recall that any quasi experiment requires an identification assumption to make it as good as an experiment. What is the identification assumption for regression discontinuity design? Explain whether your graphs in 3c and 3d are consistent with that assumption.

5.   Suppose your school superintendent is considering areform to reduce class sizes in your school from 40 to 35. Use your estimates above to predict the change in math and verbal test scores that would result from this reform.

Hint: divide the RD estimate of the change in test scores by the change in number of students per class at the threshold.

6.   Now suppose you are asked for advice by another school that is considering reducing class size from 20 to 15 students – a 5-unit reduction as above. Would you feel confident in making the same prediction as you did above about the impacts this   change will have? Why or why not? Compare the magnitudes of treatment effects between these two studies (the STAR experiment and the Maimonides’ Rule). Be aware that the units of scores in two settings differ.

7.   Given the evidence above, would you encourage your hometown school to reduce class size by hiring more teachers if the goal is to maximize students’ long-term outcomes (e.g., college attendance rates, earnings)? Explain clearly what other data you would need to make a scientific recommendation and how you would use that data.

8.   (Bonus question) A follow-up paper uses more recent data from 2002–2011 to examine the Maimonides-rule-driven class size on student performance.

Angrist, J. D., Lavy, V., Leder-Luis, J., & Shany, A. (2019). Maimonides rule redux. American Economic Review: Insights, 1(3), 309-324.

In contrast to their earlier work (Angrist and Lavy, 1999), the more recent paperdocuments no evidence of class size effects.  Read the paper briefly with the following questions in mind.

(a)  The paper documents manipulation around the cutoffs. Discuss  the incentives behind such manipulation and empirical tests to identify manipulation.

(b) List one explanation in the paper on why the earlier positive results documented in

Angrist and Lavy (1999) do not hold with more updated data. Is the reason a threat to internal of externality validity of the  original study