Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data and Policy Summer Scholar Program

Education Policy

Summer 2022

Introduction

In this capstone project you will try to estimate the impact of a public policy designed and introduced by the City of Chicago, named Safe Passage Program. Use this link to familiarize yourself with the program. The main idea behind the program is “to provide a positive, trusted adult presence for students as they travel to and from school. 

About the datasets

For this Capstone we are providing you with four datasets:

The first one is called: ‘metrics_attendance_2021.xlsx’. This dataset contains the school attendance infor- mation for every public school in the City of Chicago from 2003 to 2021 (we don’t have information about 2020 due to the covid-19 pandemic). Although we’re providing this dataset, you can find it here.

The second and the third datasets are the: ‘control.csv’and the ‘treated.csv”. These datasets contain the ID of the schools that had the Safe Passage Policy implemented in their surroundings, the year that the implementation took place (treated schools), and the schools where the policy was not implemented.  In addition, both datasets contain each school’s latitude and longitude position.

The fourth dataset is the crime.csv.’. This file contains information about the crimes that occurred within

a radius of fewer than 500 yards from the school.  Some of this information is the type of the crime that occurred, the Latitude and Longitude of the crime, and its location description.

The fifth dataset is the:‘schools-vs-crime.csv’. This file contains three variables:

1.  CRIME_ID: the unique identifier (ID) of a crime

2.  SCHOOLID: the unique identifier (ID) of the school

3. distance: the linear distance (in yards) between a school and a crime scene. It only contains the crimes that happened within a less than 500 yards linear radius range from the schools.

For this project you will need the following packages: ‘lubridate’, ‘tidyverse’,‘plm’,‘leaflet’,‘lmtest’.

Question 1

In this exercise you will have to transform the metrics dataset: it’s called‘metrics_attendance_2021.xlsx’.

a) Load the ‘metrics_attendance_2021.xlsx’ dataset into your RStudio session.

Hint: the data can be find in sheet number two (2) of the excel file

b) After loading the dataset, you should remove the column named‘2021’and transform this dataset so that its columns that go from 2003 to 2019 become rows. Once you’ve done such a transformation, filter this new column so it only contains data for the years between 2013 and 2018 (both years included). Finally, filter the dataset so the Group’column only has information about ‘All (Excludes Pre-K)’.

Hint: remember the pivot functions we have learned in classes

Question 2

a) Load the‘control.csv’and the ‘treated.csv’files into your current RStudio session. After binding them together, join it with the dataset from (q1).

b) Create a map displaying the treated and control schools, each group of schools in a different color. Using four or five lines describe the distribution of the treated schools throughout Chicago.  Do they seem to be concentrated in any region of the city?

Hint: the‘leaflet’package is an interesting package to be used for this exercise - more info here

Hint 2: if you decide to go with the leaflet package, see that you will have different options to plot the points in the map - such as ‘addCircleMarkers’,  ‘addCircles’, etc.

Question 3

In this exercise, we will be transforming the crime data so it can be further joined with our dataset from (q2). You will also have to perform some exploratory data analysis with the dataset.

a) Load the‘crime.csv’and the‘school-crime-distance.csv’datasets into your R current session. Join them together so, at the end, you have a dataset containing the crimes that occurred in Chicago between

2013 and 2018 and their distance from Chicago Public Schools.

b) Using the leaflet’package, create a map that shows all the crimes that took place within a 50 yards distance from schools. Just looking at the map, does it seems to exist any concentration of crimes in any particular area of the city? Or do the points look equally distributed?

c)  (BONUS QUESTION) Given that this dataset contains a lot of information, do some exploratory data analysis of its primary content. You have to produce at least two outcomes, and you are free to choose any type you desire - could be tables, graphs, maps, etc.

Question 4

We will put together all the datasets we have been working with in this question. After that, we will visually check if the ‘parallel trends’ assumption necessary for the fixed effects regression we will be running seems to hold, and get ready to run and interpret our model results.

a) Prof. Austin mentioned an identifying assumption that needs to be proved before running a fixed effects regression:  the ‘parallel trends’ assumption.  Using the data from (b), you will need to plot a graph that displays if such an assumption holds in our case.  What we want to see graphed is:  the years going from 2013 to 2018 as the x-axis; the average of total crimes that happened close to schools as the y-axis; and two lines, one for the mean total of crime per year that occurred near treated schools and one for control schools. Also, ensure you add a vertical line indicating the year when the Safe Passage Policy was introduced for the treated schools. Briefly describe your results. If you want a brief recap of this assumption, this video can be a helpful source.

Hint 1: you might have to join the dataframe from (a) with the dataframe from question 2

Hint 2: to plot a vertical line in your graph indicating when the safe passage policy started, you can use the geom_vline’function

Question 5

In this final exercise, we will run the analytics model that will answer the impact of the Safe Passage Policy on both crime occurrence close to schools and school attendance.

a) Run a fixed effects regression that has the total crime variable as the dependent variable (y) and the variable indicating the presence of the Safe Passage Policy as the independent variable (x). It should also include both‘school’fixed effects and‘year’fixed effects.  The function you should use is called ‘plm,’ and it comes from the‘plm’package (you will need to install it).

a.1) You will be using five arguments of this function. The‘model’argument should be set equal to‘wihtin’, and the‘effect’argument should be set equal to‘twoways’.  Use the‘coeftest’ function over you regression object instead of the ‘summary’ function. Briefly describe your results and interpret the coefficient you got.

Hint: you will need to create a column that indicates the ‘presence’of the Safe Passage Policy that has ‘yes’ for every year where a school was part of the Safe Route Passage policy and‘no’otherwise.

b) Run the same regression you ran for (a), but now using school attendance as the dependent variable  (y) and the presence of the Safe Passage Policy as the independent variable (x).  Use the‘coeftest’ function over you regression object instead of the‘summary’ function.  Briefly describe your results  and interpret the coefficient you got.