Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EC203 Stata Assignment #1 Fall 2023

Read and explore data, summarize variables and graph distributions and relationships

NOTE: This assignment has TWO PARTS:

1.    Submitting your log file and do-file on Gradescope and

2.   Answering questions on Gradescope.

TIP FOR SUCCESS:

•    Use comments throughout to create a “roadmap” for both you and the TA

•    You are not required to write in the answers to your do-file, but doing so may help you. It would also create a reference that combines the concepts with the actual data cleaning and analysis.

1.    Open Stata

2.    Begin a new do-file

3.    Download the data set from Blackboard under “Stata Assignment #1”

4.   Turn the Excel data set into a Stata-formatted data set.**

5.    Save the data set as lastname_firstname_ps1.

6.    Report the number of variables and the number of observations.**

7.    Generate new variables from existing ones.

8.    Label variables.

9.    Summarize a variable using the sum command; interpret a few summary statistics.** 10. Generate a histogram.

11. Calculate two z-scores.**

12. Generate a new binary variable based on the values of an existing variable.

13. Summarize a variable based on the values of another variable.**

14. Replace 0s with missing values for certain variables.

15. Summarize the same variable after the changes made above.**

16. Create ascatterplot with a trend line.

17. Clean up the scatterplot.

18. Save the data set.

19. Triple-check your do-file has all (successful) commands.

20. Rerun the do-file to check for errors.

21. Start a log file and run your do-file from beginning to end. DOUBLE-CHECK your log file. 22. Submit your log file and your do-file in Gradescope.

**Means there is an associated Gradescope question (or multiple) with this task.

***Submit your do-file & log file on Gradescope before Monday, Oct. 30th  at 11:59pm. No exceptions! “Late” begins at 12:05am ET (Boston time) on Tuesday, Oct. 31st  and you lose 10% per day after this  deadline.

****BE SURE YOU SUBMIT THE CORRECT FILES IN THE LOG FILE ASSIGNMENT AND DO-FILE ASSIGNMENT ON GRADESCOPE!!

Getting Started with Any Stata Project

READ THIS:

•    The instructions below should help you through each step; please read them carefully.

•    Show all your work in the do-file (Copy/paste ALL (successful) commands into the do-file)

•    You will be graded on both commands in your do-file and your answers to the questions on Gradescope.

1.    Open Stata.

2.    Begin a new do-file. Click on “New Do-file Editor.”

a.    Use comments to place your name, BU ID, “Stata Assignment #1”, and “EC203 Fall

2023” on four separate lines. Note that you can type “help comment” in the Command window to get more info. To get information about any command in Stata, you can use the “help” command.

b.   Save the do-file as “lastname_firstname_ps1.do” – (I suggest inside an EC203\Stata\Assignment1 folder that you create.)

c.    Copy/paste the following lines underneath your personal information (separated by at least one line) to ensure ease of running the do-file from beginning to end without

error:

clear all

set more off

capture log close

Note: to receive full credit, you must use comments to provide a “roadmap” for all of the different commands that will be in your do-file. For example, use “Getting the mean and

standard deviation” as a comment when using the sum command. Comments can help you

follow along with what each group of lines are designed to do. This can be very helpful. Also,

chunks of commands can be separated by a few empty lines to be able to easily browse and find certain parts of your do-file(s). Note that sometimes the command is obvious (no comment

required) like when you are using therename command to rename a variable. In this case, please at least put the question # in your do-file as a comment.

3.    Download the Stata Assignment #1 Data Excel file from Blackboard.

•    Save this data set in the same folder in which you saved your do-file.

4.    Bring the data into Stata: You need to bring the Excel file that you downloaded into Stata to use Stata for your data analysis. Use the help command and Stata documentation to understand the  difference between the use command and the import command. Pick the correct command and  use it to bring the Excel file into Stata. Make sure this command is included in your do-file. (Hint: you’ll need to add/click an option to tell Stata to treat the first row in Excel as variable names).

a.    In GRADESCOPE, answer a question about the difference between use and import.

5.   You should now have your data in Stata. It can be extremely helpful to save your data at this point as a Stata data set. (Remember it is only an Excel file being viewed within Stata at this

point.) To save the data in memory as a Stata data set, use the command save. Save this data set by using the drop down menu (File ) and save it as lastname_firstname_ps1_data.

a.    Be sure to add  , replace to the end of this line when you copy/paste it to your do-file. It  should look like below on your do-file. DO NOT accidentally also copy/paste the “result”

after the save command. (The command has the . in front, the “result” is below it.)

save [path]/lastname_firstname_ps1_data, replace

•    Notice that the file type by default is .dta

i.   This is the file extension for Stata-formatted data!

•    Also notice that the first time you do this, it will show the “result” that

“ [path]/Gelsheimer_Stacey_ps1_data” not found. (Note: this is the result you do     NOT copy/paste into your do-file.) This result is because you have said replace if it  exists so that you can run your do-file again, but it does not exist the first time you run it. You do not need to be alarmed by this result, but the command should run  successfully without error. (There should be no red result.)

6.    Notice the Properties window inside the Stata interface. In GRADESCOPE, answer the following:

a.    How many variables are there in this data set?

b.    How many observations are there in this data set?

7.   Generate two new variables:

a.    One which equals the percent score for the midterm (call it midterm_perc). i.   The midterm was out of 50 points.

b.   One that equals the percent score for the final exam (call it finalexam_perc). i.   The final exam was out of 25 points.

c.    For example, someone who earned 40 points on the midterm should have a value of 80 (40/50*100) for the new midterm percent variable.

i.    Note that generating a new variable that equals some mathematical expression involving another variable will calculate the value for each and every

observation all at once.

ii.    Example: gen x=y+2 will take every observation’s value of y, add 2 and that new value will be the value for that observation’sx (and this will happen for every

observation all at once)!

8.    Label the new variables:

a.    Label the variable midterm_perc with “ Midterm Score (%)”

b.    Label the variable final_perc with “ Final Exam Score (%)”

9.    In GRADESCOPE, answer a few questions related to the directions below:

a.    Use the sum command to see the average midterm and final exam scores (the original variables) across the entire data set. In GRADESCOPE, report themin, max and mean.

b.    Use the sum command to see the average midterm and final exam percentages (using your newly created variables) across the entire data set. In GRADESCOPE, report the    min, max and mean.

*******For the remainder of this assignment, you will be using the midterm and final exam

grades as measured in percent (the newly created variables). You can ignore the score versions for the rest of the assignment.

c.    In GRADESCOPE, interpret the standard deviation of finalexam_perc.

d.    Now look at the detailed summary statistics for finalexam_perc and answer the corresponding questions in GRADESCOPE.

10. Generate a histogram of the midterm_perc variable, using frequencies (instead of densities) as the Y-axis variable and make the “width” of the bins = 15. Be sure that your graph has an

appropriate title (such as “Midterm Grades EC203 Spring 2022”) and cleanup the X-axis by

changing it to “ Midterm Grades” (both without quotes). Be sure that the title and X-axis label end up in the code you copy/paste into your do-file (by modifying them in the dialogue box,   NOT using the Graph Editor).

11. Use the display command to calculate two z-scores, one for an observation that earned a 93% on the midterm (using the midterm_perc variable) and one that earned a 75%.

a.    Before calculating the z-score, round all pieces involved in the formula to 1 decimal place.

b.    In GRADESCOPE, answer a few questions.

12. Now generate a new binary variable (called above_med_midterm) that equals 1 for all students who earned above the median midterm exam (percentage) grade and a 0 for all students who

earned at or below the median. Do the same for final exam scores (final_perc),generating a

binary variable for above final exam scores that equals 1 for all students who earned above the median and a 0 for all others.

VERY IMPORTANT NOTE TO NEVER FORGET WHEN USING STATA!

When you use an if statement in Stata that includes the inequality “>”, then all missing values satisfy this condition.  This means that if a student missed the final exam, saying:

gen  above_med_midterm = 1 if midterm_perc>80 (for example, this isn’t using the correct

median) would assign a 1 to all students that earned above an 80 on the final exam AND all the students for which there was no final exam grade (those that have a missing value). You can

imagine why this might throw off any analyses you might do involving this new variable, because perhaps the student(s) who missed the final exam had lower HW grades,a lower midterm, etc.    (and dropped the class). Assigning the student to the “above median final exam grade” would be the opposite of the truth! (See more details at the end of this document.) Here is how you fix it:   gen  above_med_final = 1 if final_perc>80 & final_perc !=.    /*the “.” is the missing value*/

As you may already know, != means “does NOT equal”, so the last piece of this statement is

saying assign a 1 for this new variable to any student who have a final exam grade above 80 AND NOT equal to . (missing)

a.    CHECK FOR MISSING VALUES IN THESE VARIABLES. There area few ways to see if a particular variable has missing values (and how many):

i.   When looking at summary statistics for a variable, the number of observations will be less than the total number of observations in the dataset (found in the  properties box). You could do a quick subtraction to figure out how many

missing values there are. Alternatively (and more straightforward), ii.   You can ask for the “codebook” of a variable.

1.   Type codebook midterm_perc final_perc in your command box. b.    In GRADESCOPE, answer a few questions.

13. Use the sum command to see the average midterm (%) grade ONLY for students who have a  value of 1 for above_med_final. Then use the sum command to see the average midterm (%) grade for the other group. In GRADESCOPE, answer a few questions.

14. Notice that one of the groups has at least one student that earned a 0 on the midterm (from the reported minimum). Remembering from class that “extreme values” can have large impacts on   averages, let’s replace the values of 0 to missing so that they are not included in our calculation. (In reality, noone earned 0 points. Rather, they missed the exam, so their score should be

reported as missing (rather than 0) for any real analyses.)

a.   That is, replace the values of midterm_perc that equal 0 with a . (dot/period)

i.   The symbol .  (dot/period) in Stata represents a missing value for a numeric variable.

ii.    Hint: This can be done in one line with areplace … if command. iii.    Hint #2: Don’t forget about the difference between = and ==

15. Now (after replacing the 0swith .) repeat Step #13 from above. In GRADESCOPE, answer a few questions.

16. Make a scatterplot with finalexam_percas the Y variable and midterm_percasthe X variable. and add a line of best fit (under “fit plots” and “linear prediction”).

a.    Take a look at what it looks like in its most raw version. Is it ready to be presented to an audience, or could we clean it up and make it more presentable? You guessed it! Let’s    clean it up …

17. Add “Final Exam (%)” to the Y-axis using the dialogue box.  Again, make sure you have a clear and concise title. (Use “ Relationship between Midterm and Final Exam Grades”, without

quotes.) Also, hide the legend.

a.    Be sure the command from Part #16 (the first graph code) and Part #17 (the improved     graph code) both end up in your do-file! Run the commands one at a time to notice how much better your graph looks!

18. Save your data. Save the revised dataset once again using the same name as before and include the appropriate code in your do-file.

19. Now triple-check that your do-file has the commands from every step listed in this assignment.

20. Rerun your do-file from beginning to end and confirm it runs without error. (Be sure to check the results window!)

21. Start a log file using the drop-down menu. (File  Log  Begin)

a.   Add the log using line that Stata creates to the top of your do-file UNDERNEATH THE THREE LINES OF CODE THAT ended with capture log close.

b.   Add, replace at the end of the log using line so that Stata knows you’re willing to overwrite any version that currently exists if you rerun your do-file at a later time.

22. Your do-file should now have EVERY successful command that youran, plus comments acting as a “ roadmap” of what you are doing at various stages. Run your do-file one last time from

beginning to end and make sure there are no errors. The results window should show “end of   do-file” with no red errors if it successfully runs. Your log should also be completely clear of any errors. (Your log using line will need to have , replace at the end of it to overwrite any previous  versions of your log file. BE SURE TO CHECK YOUR LOG FILE FOR COMPLETION AND ACCURACY PRIOR TO SUBMITTING IT!)

•   Congratulations! You’redone with Assignment 1! You should have familiarized yourself with various commands within Stata, how to load data into Stata, how to browse your data to

better understand it, how to generate new variables, and how to visualize some of your   variables using histograms and scatterplots. Well done! Congratulations! You should also now be able to do all of these same things for your Chelsea Project!

***Submit your do-file & log file on Gradescope before Monday, Oct. 30th  at 11:59pm. No exceptions! “Late” begins at 12:05am ET (Boston time) on Tuesday, Oct. 31st  and you lose 10% per day after this  deadline.

*****BE SURE YOU SUBMIT THE CORRECT FILES IN THE LOG FILE ASSIGNMENT DROPBOX AND DO-FILE ASSIGNMENT DROBOX ON GRADESCOPE!! (Don’t

accidentally submit your data set, or confuse the files and submit them into the wrong “dropbox”!)