Foundational Skills Workshop - PSET #2

Available Weds, 10/04  at 5p

Due Fri, 10/13 at 5p

 

TABLE OF CONTENTS

 


GOAL PAGEREF _mjem8ansq6vc \h 2

DATA PAGEREF _7rbpfka4nmm2 \h 2

INSTRUCTIONS (also appears in Canvas) PAGEREF _eeh123lsodlw \h 2

PART 1. START A .DO FILE USING BEST PRACTICES (6 points) PAGEREF _bmaouiw61sy9 \h 3

PART1, Q1: PAGEREF _8ypscjth8b9b \h 3

PART 2. LAY OF THE LAND (4 points) PAGEREF _1bk7r8v7vjnl \h 4

PART2, Q1. PAGEREF _qotdjujxcwag \h 4

PART2, Q2 PAGEREF _n9ql28yq509d \h 4

PART2, Q3 PAGEREF _iktnij9pgcqs \h 4

PART2, Q4 PAGEREF _jjs43od60r57 \h 4

PART 3. DESCRIPTIVE STATISTICS (3 points) PAGEREF _9z9ptp4bitgj \h 5

PART3, Q1 PAGEREF _8uj022kdp1b \h 5

PART3, Q2 PAGEREF _dr60adqa471p \h 5

PART3, Q3 PAGEREF _7cqd8yu9euo4 \h 5

PART 4. CONSTRUCTING NEW VARIABLES (2 points) PAGEREF _qab4e23kmgi8 \h 6

PART4, Q1 PAGEREF _m24hc2dnl7az \h 6

PART4, Q2 PAGEREF _tkqj7bhaaf9o \h 6

PART 5. FIRST-TIME KINDERGARTENERS (4  points) PAGEREF _vc040loxzswg \h 7

PART5 Q1 PAGEREF _svydqbgxehxe \h 7

PART5 Q2 PAGEREF _hrnx13oc0ivr \h 7

PART5 Q3 PAGEREF _px7fax9vf6vw \h 7

PART5 Q4 PAGEREF _k2ehtu6rertf \h 7

PART 6. REFLECTIONS AND FINAL UPLOADS (6 points) PAGEREF _bac7muxox1pe \h 8

PART6 Q1 PAGEREF _7aiklt5akp4e \h 8

PART6 Q2 PAGEREF _w8wz552ojfnx \h 8

PART6 Q3 PAGEREF _5m9i919skgm0 \h 8


 

GOAL

The goal of this assignment is to give you more practice using descriptive commands in stata and constructing variables using a large-scale dataset. This assignment builds on previous material and requires careful use of the provided documentation materials. 

 

DATA

● This week’s data comes from the Early Childhood Longitudinal Study - Kindergarten Cohort (ECLS-K).

● The dataset is called “PSET2_FSW2022_ECLSK.dta 

● Please Note: We have also provided a User’s Guide that provides additional information which provides needed information about how the data is coded.

● Description of ECLS-K: The Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) focuses on children's early school experiences beginning with kindergarten and following children through middle school. The ECLS-K data provide descriptive information on children's status at entry to school, their transition into school, and their progression through 8th grade. The longitudinal nature of the ECLS-K data enables researchers to study how a wide range of family, school, community, and individual factors are associated with school performance. The data for the homework comes from the first wave of the ECLS-K, which was collected in the fall of 1998 when the children were beginning Kindergarten. To make the data manageable, many variables were omitted and only children who were living in the south are included.

 

INSTRUCTIONS (also appears in Canvas)

- You may make yourself a copy of this FSW_PSET2_2023 Google Doc that contains all of the PSET 2 questions to guide your work and help organize your .do file. But you will actually answer the questions in a Canvas Assessment (like you usually do for Check-Ins).

- I STRONGLY ENCOURAGE YOU TO READ THE ENTIRE GOOGLE DOC BEFORE YOU BEGIN.

- Consider answering all the questions in your own .do file and then, at the end, going into Canvas to submit your answers and upload your .do file and log file.

- In Canvas, you will also upload your .do file using the naming convention Wk07_PSET2_Lastname.do (e.g., Wk07_PSET2_Atteberry.do). The goal is to make sure it runs from start to finish without errors.

- In Canvas, you will also upload your log file using the naming convention Wk07_PSET2_Lastname_log.pdf

- While attempting the PSET, you may confer with other students, utilize all materials on Canvas and your own notes, and you may use the internet. However, you may not literally co-write code together, and you may not copy/paste code or answers from another student. You also may not examine a PSET from a student who previously took FSW. If you work with or confer with other students, please list them as collaborators in the header of your .do file. Each student must submit their own .do file and log file, which they write themselves.

- For more guidance on the PSETs, please refer back to the Grading and Assessments section of the syllabus.

 


PART 1. START A .DO FILE USING BEST PRACTICES (6 points)

Create a .do file using the .do file template Allison presented in class and you altered to reflect your own LPPP 6001 folder location. Give your .do file a name: Wk07_PSET2_YourLastName.do. In your do.file:

● Include a header and globals for the classpath and other class folders (1pt).

● Include command(s) in your do.file to open the data from your “2 Raw Data” folder (2pt) 

● Write command(s) in your do.file to begin a log file in your “5 Log Files” folder, to close that log file at the end, and to translate it into a .pdf called Wk07_PSET2_YourLastName_log.pdf (2pt)

● Be sure to use comments throughout your do.file, especially to indicate which question you’re answering. For examples of what a final .do file could look like, examine .do files posted by the professor on Canvas. We will examine your uploaded .do files (see Q6b) to assess these aspects of your work product.

 

PART1, Q1:

In your own words, how successful were you with PART 1? What parts did you struggle with the most, if any? How has your understanding of organizing files for analysis using Stata developed since you turned in PSET1?  (1pt). [Short answer in Canvas]

 


PART 2. LAY OF THE LAND (4 points)

PART2, Q1.

How many observations are in this dataset?

Your Answer: There are ____ observations in this dataset.

[Note: Enter all your answers in Canvas, however I’m sharing with you what it looks like in Canvas here, for reference]

 

PART2, Q2

How many variables are in this dataset?

Your Answer: There are ____ variables in this dataset.

 

PART2, Q3

How many different schools are included in this dataset? [Hint: The codebook command will be useful here]

Your Answer: There are ____schools in this dataset.

 

PART2, Q4

Many variables in the dataset begin with the letters “p1”. What does this mean? What does it mean when variables start with the letters “c1”? What about “t1”? [Short answer in Canvas]

 

 


PART 3. DESCRIPTIVE STATISTICS (3 points)

IMPORTANT NOTE: Be careful about missing values. See the data documentation to see how missing data is identified in the dataset (e.g., Section 7.2). Throughout this PSET, do not include the missing data when calculating the descriptive statistics to answer the RQs, or you will get the wrong results. Take care of them as needed (e.g. recode them, use if statements to exclude them, etc.)

PART3, Q1

There are 4 types of schools in this dataset (cs_type2). I’d like you to compare and contrast the chances that students of different ethnoracial categories (see race variable) attend public school by conducting the following analyses: What percentage of White students (labeled WHITE, NON-HISPANIC) attend public schools (labeled, PUBLIC/DOD/BIA)? What percentage of Black students (BLACK OR AFRICAN AMERICAN, NON-HISPANIC) attend public schools? What percentage of Hispanic students (both RACE SPECIFIED and RACE NOT SPECIFIED) attend public schools?

Please write 1-2 sentences about what you learned (you do not need to include the Stata output (we can see that in your do/log file), but do include the percentages you calculated in the sentences of your write-up).  [Short answer in Canvas]

 

PART3, Q2

What percentage of children in this sample speak a primary home language other than English (wklangst)?

Your Answer:  ____% of children in this sample speak a primary home language other than English (round to 2 decimals).

 

PART3, Q3

What percentage of children in this sample have a mother who has attained a level of education beyond a bachelor’s degree? (wkmomed)?

Your Answer:  ____% of children in this sample have a mother who has attained a level of education beyond a bachelor’s degree (round to 2 decimals).

 

 


PART 4. CONSTRUCTING NEW VARIABLES (2 points)

IMPORTANT NOTE: Continue to address missing values as described in PART 3.

 

PART4, Q1

Daily Reading: Generate a dummy variable called daily_read, based on p1readbo. Your new daily_read dummy should = 1 if the parent reported reading to their child EVERYDAY, and = 0 if parents reported reading to their children less than daily. Be sure to label your variable and its values. Now tab your new variable (exclude missing values) and paste the output from your Results Window into Canvas (be sure to put it in Courier New font).

 [Short answer in Canvas, Courier New font]

 

PART4, Q2

The ECLS-K provides standardized math scores (c1mtscor) and reading scores (c1rtscor) at Kindergarten entry. Using the generate command construct a new variable, called c1_avg_tscor that is equal to the average score on these two assessments (e.g., add the two variables and divide by two). Be sure to label your new variable. What is the minimum, maximum, and mean value of your newly-generated variable?

Your Answer: The mean of the new variable is ____, the minimum is ____, and the maximum is ____ (round all to 2 decimals).

 


PART 5. FIRST-TIME KINDERGARTENERS (4  points)

IMPORTANT NOTE: Continue to address missing values as described in PART 3.

PART5 Q1

Start by generating a dummy variable, c1mtscor50plus, which is set equal to 1 if the child’s math score when they enter Kindergarten (c1mtscor) is 50 or above, 0 if it is below 50, and missing if c1mtscor is missing (label variable and values, as always). Among children with non-missing data on this variable, what percent have a math score 50 or above?

Your Answer:  Among children with non-missing data on this variable, ____% of have a math score 50 or above (round to 2 decimals).

 

PART5 Q2

Among children with non-missing data, what percentage of children are considered “below the poverty threshold” (wkpovrty)?

Your Answer:  Among children with non-missing data on this variable, ____% are in poverty (round to 2 decimals).

 

PART5 Q3

Construct a categorical variable called pov_math, based on wkpovrty and your new c1mtscor50plus dummy. Your new variable should have 4 values: 1= below poverty threshold, and has a math score < 50; 2= below poverty threshold, and has a math score >=50; 3= at/above poverty threshold, and has a math score < 50; 4= at/above poverty threshold, and has a math score >=50. Anyone with missing data for either of the two underlying variables should have a missing value for your new variable pov_math. As always, label your variable and its values. Now tab your new variable (exclude missing values) and paste the output from your Results Window into Canvas (be sure to put it in Courier New font).

 [Short answer in Canvas, Courier New font]

 

PART5 Q4

Write 1-2 sentences about what you have learned about poverty status and math scores at the start of K.

 [Short answer in Canvas]

 


PART 6. REFLECTIONS AND FINAL UPLOADS (6 points)

 

PART6 Q1

How much time did it take you to complete this homework assignment? What was the most challenging part? Do you have any questions about the material presented so far? (1pt)

[Short answer]

 

PART6 Q2

Upload your .do file using the naming convention Wk07_PSET2_Lastname.do. The goal is to make sure it runs from start to finish without errors. (up to 4 points for code quality/clarity/functionality)

[File Upload in Canvas]

 

PART6 Q3

Upload your log file using the naming convention Wk07_PSET2_Lastname_log.pdf (1pt)

[File Upload in Canvas]