EC 295 INTRODUCTORY ECONOMETRICS

ASSIGNMENT 1


This assignment is designed to introduce you to some basic data manipulation in STATA. Its aim is to provide you with you with some experience in applying the statistical methods we discuss in class.

In this assignment, I provide you with the relevant STATA commands that you will need, but you will have to come up with the appropriate syntax and options. For this, you should turn to these sources: 1. Dr. Smith’s very helpful STATA videos; 2. the “help” function in STATA; 3. the STATA Reference manuals, which are available within STATA as pdfs; and, 4. Google. Some of these commands will be covered in the tutorials, so please review those Zoom videos. As always, you may come to me for help.


Instructions

In MyLearningSpace (MLS) you will find a data file called “assign1.dta”. Download this file, noting where you save it.

There is also a template do-file that you must use to write your own do-file. Save it in the same folder as your data. Please do the following to create your own do-file, using the template:

1. Rename the file as your surname, followed by your student number. For example, my file would be essaji123456789.

2. Replace the text in CAPITAL LETTERS in the do-file with appropriate information.

3. Leave everything else the same. Type in your STATA commands between the lines “Insert your STATA commands below here” and “Insert your STATA commands above here.


Submission

To the Dropbox on MyLS, please submit the following.

1. A report containing all your answers to the questions. For each question, included the STATA code, if any, that you used, and the output generated by that command. Then, if you were asked to provide an interpretation, provide that.

2. Your STATA do-file.

3. Your STATA log-file.

To Gradescope, please submit your report.

Due Date

To account for the loss of tutorial due to Victoria Day, the assignment will now be due on Wednesday, June 9, 2021.

Assignment

Each question is worth five (5) points.

Analysis of a Single Variable

1. Using the summarize command, compute the 10th and 75th percentiles, the median, mean and standard deviation of the Fall reading score. Interpret each value.

2. Manually compute the t-statistic for testing the null hypothesis that the Fall reading score is 50.5, against the alternative that it does not equal 50.5. Hint: use the scalar command.

3. Using a significant level of 5%, use the display function combined with the invttail function to compute the critical value for the hypothesis test. Do you accept or reject the null hypothesis? Explain why. Recall that the degrees of freedom are n-1.

4. Using the ci command, compute the 95% confidence interval for the mean Fall reading score. What is the set of null hypotheses that we would accept at the 5% level.

5. Using the ci command, construct the 99% confidence interval for the mean Fall reading score. Explain why this interval is narrower than the 95% interval.

6Using the ttest command, perform a one-sided test where the null hypothesis is that the mean Fall reading score is 51.5, against the null hypothesis that mean is less that 51.5. Based on the p-value for this test, what is the range of significance levels that would lead you to reject the null hypothesis?

7. Using the graph bar command, plot the mean Fall reading score of children who had a low birthweight, versus those who did not. Label the bars with the mean test score for each group (Hint: this is the bar “height”). You will need to add an option to do the latter.

Joint Distributions: Discrete Random Variables

8. Use the tabulate command to generate the joint probability distribution between “Student took pre-k” and “Low birth weight”. To generate this table, you will need to add an option to the tabulate command. While it will not provide you exactly the answer you are looking for, watching Dr. Smith’s video “Intro to Stata: Summarizing Data” will be helpful.

9. Use the tabulate command to generate the probability distribution for “Low birth weight”. Then use the tabulate command to produce the probability distribution for “Low birthweight” conditional on the student having taken pre-k. How does the probability that the student had a low birth weight change when you look only at students who took pre-k? Based on this, is the having a low birth weight independent of having taken pre-k?

Correlations between Continuous Variables

10. Using the correlate command, compute the covariance between Fall reading scores and parental income. Now use the generate command and create a new variable called income2, which divides parental income by 1000. Compare the two covariances and explain any differences.

11. Using the correlate command, compute the correlation between Fall reading scores and parental income. Now compute the correlation between income2 and Fall reading scores. Compare the two correlations and explain why these values differ, or why they do not.

12. Obtain the mean Fall reading score by the number of years of teaching experience by typing

egen meanread1=mean(readtc1), by(totyrstchkp)

Using the twoway scatter command, draw a scatterplot of average Fall reading scores, and teacher experience. Based on this graph, do reading scores and teacher experience seem independent?

13. Create a variable of the mean Spring reading scores by the number of years of teaching experience. Using the twoway scatter command, draw a scatterplot of average Spring reading scores, and teacher experience. Based on this graph, do reading scores and teacher experience seem independent? Comparing this scatter plot with the one in (12), can we say anything about the effect of teaching experience on reading scores?

14. Suppose we wanted to model the relationship between Fall reading scores and parental income using the following linear regression model:

reading score = β0 + β1income + u

Interpret β0, β1, and u.