EC204 Midterm 1 Spring 2025

发布时间：2025-10-10

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EC204

Midterm 1 Spring 2025 with Answers

Suppose I have a dataset containing the same S&P 500 companies observed each year from 2015-2024. The data collected include:

· age of company (<1 year, 1-4 years, 5-9 years, 10+ years)

· average stock price last year ($)

· gender of the CEO

· number of employees

· industry (manufacturing, technology, services, etc.)

1. Match the variable with its type

Quantitative: write Q on the line

Categorical and Nominal: write N Fall 2025: We didn’t cover this

Categorical and Ordinal: write O Fall 2025: We didn’t cover this

Variable	Type of Variable (write Q, N or O only)
Age of company (<1 year, 1-4 years, 5-9 years, 10+)	O
Average stock price year	Q
Gender of the CEO	N
Number of employees	Q
Industry	N

2. What type of dataset is this?

¨ Cross-sectional

¨ Time series

¨ Panel

¨ Pooled cross-sectional

3. The “unit of observation” for this dataset is (hint: what makes each observation unique?)

¨ Dollars

¨ Male/female

¨ Number of employees

¨ Industry type

¨ Company

¨ Company-year

4. A properly-structured dataset would have the variables in different _______ and the observations in different _______.

¨ columns; columns

¨ rows; columns

¨ columns; rows

¨ rows; rows

5. Sampling distributions: Suppose I hypothesize that the average test score for the EC203 midterm last semester is 80 points (out of 100). If this is true and the standard deviation of test scores equals 16 points, then 95% of all of the possible samples of 64 tests that I could randomly draw will give me an average between: (round to 1 decimal place)

76.08 and 83.92

Standard error of Xbar = 16 / sqrt(64) = 2

80 +/- 1.96*2 = (76.08, 83.92)

6. Suppose that and .

Calculate the t-statistic and write it in the box below.

Is the coefficient statistically significant at any level? In other words, is it statistically different from 0 (at any significance level)?

¨ Yes

¨ No

Fall 2025: I told you I wouldn’t ask you to know 1.645, 1.96 or 2.576

If instead the standard error of = 10, is the coefficient statistically significant at any level? In other words, is it statistically different from 0 (at any significance level)?

¨ Yes

¨ No

7. If the research question is: What is the effect of sleep on exam scores?

then the dependent variable is _______ and we call it the _______ variable.

¨ sleep; Y

¨ sleep; X

¨ exam score ; Y

¨ exam score ; X

8. An economist is studying the relationship between a company's investment in research and development (R&D) (X) and its profits (Y). Initial analysis shows a strong positive correlation: higher R&D spending is associated with higher profits. However, the economist suspects that the company's size (Z) might be a confounding variable.

Why might company size be a confounding variable in this analysis?

¨ Larger companies tend to have more resources to invest in R&D, but company size doesn't directly influence profits.** (Plausible, but incorrect, as size likely affects both R&D investment and profits.)

¨ Company size is only weakly related to R&D spending and has a negligible effect on profits.** (This downplays the potential influence of size, making it an incorrect choice.)

¨ Larger companies are more likely to have the resources to invest heavily in R&D, and they also tend to have higher profits due to economies of scale and market dominance.** (This is the correct answer, highlighting how size influences both X and Y.)

¨ While there's a relationship between R&D and profits, company size has no correlation with either of those factors.** (This suggests a spurious correlation, but it's unlikely that size is completely unrelated.

REGRESSION #1:

*testscr: average test score in the school district (measured in points)

*expn_stu: dollar amount spent per student in the school district ($s)

Summary statistics for variables used:

	(1)	(2)	(3)	(4)	(5)
VARIABLES	N	mean	sd	min	max

testscr	420	654.2	19.05	605.6	706.8
expn_stu	420	5,312	633.9	3,926	7,712
str	420	19.64	1.892	14	25.80
avginc	420	15.32	7.226	5.335	55.33
el_pct	420	15.77	18.29	0	85.54

9. In the above regression, which variable is the dependent variable?

¨ testscr

¨ expn_stu

10. What is the research question that might have motivated this regression? (1 sentence (in the form of a question) only.)

How does expenditures per student affect test scores? (or some variation of this) Look for: the order of the variables

11. Write out the population regression model:

12. (round to 3 decimal places)

REGRESSION #1 REPEATED HERE FOR YOUR CONVENIENCE:

13. Interpret in 1 sentence.
When expenditures per student is $0, the average testscore of a district is expected to be 623.6 points.

Deduction if no mention of X = 0

Deduction if “average” or “expected” isn’t used

Deduction if picked wrong number to interpret

14. Is “meaningful” in this regression (based on class discussions)?

¨ Yes

¨ No

15. Defend your answer above in 1 sentence.
no because a school cannot be in operation without any funding/spending. Other answers may include: yes, because if the local district isn’t spending money, the federal (or state) government could be..?

16.

(round to 3 decimal places)

17. Interpret in 1 sentence.

A $1 increase in expenditures per student is associated with an increase in the expected test score by 0.006 points. (Each underlines word important for credit.)

18. What is expected to happen if a school increases their expenditures per student by $1000? Use your rounded answer above to answer this question.

Test scores would be expected to

¨ increase by (0.5 points)

¨ decrease by

6 points

(number) (units)

19. Is statistically significant? If so, at what significance level (brag about the best significance level.)

¨ NO

¨ YES, at the 10% significance level

¨ YES, at the 5% significance level

¨ YES, at the 1% significance level

20. Predict the average test score of a school district that spends the sample average amount of expenditures per student. (See the summary statistics above.) You can calculate it using the sample regression line or based on what you remember about predicting Y using the sample average of X discussed in class. If you calculate it manually, round and to 3 decimal places prior to calculating your answer and round your answer to 1 decimal place.

Place your answer in the box and show your work in the space below.

= 623.616 + .006 *5312 = 655.488 OR

654.2 because and 654.2 was given as

21. Suppose a school spends the sample average per student and has an actual test score of 600. Is this school’s residual positive, negative or zero? Show your work in the space provided.

623.616 + .006 *5312 = 655.49

¨ Positive

¨ Negative

¨ Zero

22. Is this school’s average test score being over- or underestimated?

¨ Overestimated

¨ underestimated

If one school spends $1000 more than another, it’s test score is expected to be 6 points
(how many)

¨ lower

¨ higher

(select one)

23. Name two “measures of fit” for a regression.

· R-squared

· Standard Error of the Regression (SER or Root MSE is fine)

24. Pick one measure of fit your listed above. Report the number in the box and interpret the number in 1 sentence below.
R-squared: 3.66% of the variation in test scores is explained by expenditures per student. (Deduction if said “Y” and “X” rather than variable names.)

SER/Root MSE: When using this model to predict test scores, I am wrong (or my predictions are wrong) by about 18.7 points on average (or some variation of this). Deduction if they don’t say “on average”

25. What does assumption #1 E(u|X)=0 mean in the most non-statistical words you can use? (1 sentence)

· No relationship between u factors and our chosen X OR

· X and u are unrelated

· The factors inside u are unrelated to X

26. OLS calculates the estimates and that

¨ minimize the sum of squared) residuals.

¨ maximize (the sum of squared) residuals

¨ make the (the sum of squared) residuals equal to 0

27. As the t-statistic gets larger, the

¨ Coeffiecient becomes less statistically significant

¨ Coeffiecient becomes more statistically significant

¨ Coeffiecient must stay the same

¨ Standard error gets larger

**THE QUESTION ABOVE LOOKS LIKE “#29” ON EXAM (didn’t catch the auto-numbering not applying to this question)

28. In econometrics, OLS stands for

¨ Ordinary linear squares.

¨ Optimal linear strategy.

¨ Ordinary least squares.

¨ Optimal linear squares.

**THE QUESTION ABOVE LOOKS LIKE “#28” ON EXAM (didn’t catch the auto-numbering not applying to this question)

Every question that follows from here is 1 number less on the actual exam than here (here: 29 à actual exam: 28)

29. Match the statistical significance from the list below with each coefficient.

Don’t worry about “bragging” here (about the “best” level).

PLACE THE CORRECT LETTER INSIDE EACH BOX BELOW THE CHOICES.

Please use an UPPERCASE A B C D E F G or H so that grading can be most easily automated.

A: Statistically significant at the 10%, 5% and 1% levels

B: Statistically significant at only the 10% and 5% levels

C: Statistically significant at only the 5% and 1% levels

D: Statistically significant at only the 10% and 1% levels

E: Statistically significant at only the 10% level

F: Statistically significant at only the 5% level

G: Statistically significant at only the 1% level

H: Statistically insignificant

: : : : :

***EACH BOX ABOVE SHOULD CONTAIN ONLY ONE LETTER A-H.
NOTE THAT NOT EACH CHOICE MUST BE USED AND
THE SAME CHOICE MAY BE USED MORE THAN ONCE.

*testscr: average test score in the school district (measured in points)

*above_avg_str: a dummy variable indicating whether the school district has an above-average student-teacher ratio

30. If my research question is “do school districts with large classes (a higher than average student-teacher ratio) have lower average test scores?”,

The dependent variable would be:

¨ A dummy variable indicating whether the school district has a “high” student-teacher ratio

¨ Test scores

31. If the variable is called “above_avg_str” and coded to indicate whether the school district had a “high” student-teacher ratio (meaning higher than the average student-teacher ratio in the dataset), then observations would be coded as:

School districts with STR not greater than the average STR = 0

School districts with greater than the average STR = 1

32. What is the predicted test score for school districts that have a higher-than-average student-teacher ratio? Place your answer in the box and show your work in the space provided. Round the estimates to the nearest whole number prior to calculating.

= 658 + (-8) = 650

33. Schools with an above average student-teacher ratio are expected to have _______ test scores as those with a below average student-teacher ratio. (For this question, “brag” about the “best” evidence/significance level as discussed in class.)

¨ Statistically the same

¨ Statistically higher at the 10% level

¨ Statistically higher at the 5% level

¨ Statistically higher at the 1% level

¨ Statistically lower at the 10% level

¨ Statistically lower at the 5% level

¨ Statistically lower at the 1% level

34. When a standard error gets larger and the coefficient stays the same,

¨ the coefficient becomes less statistically significant

¨ coefficient becomes more statistically significant

¨ t-statistic stays the same

¨ statistical significance does not change

35. Suppose a coefficient has a p-value equal to 0.04. Interpret this value in words. (You can use any of the 3 interpretations provided in class. I am NOT looking for whether this means the coefficient is statistically significant. Instead, how do you interpret (in words) a p-value equal to 0.04?)

· The probability the true (population) beta1 equals 0 is 0.04 (technically based on my model and data, but I didn’t emphasize this enough)

· The probability I am wrong when I say it’s statistically significant (or that there IS an effect/relationship) is 0.04

· The probability of observing a beta1hat as far away from 0 as I did (or further) IF the true beta1 actually DOES =0 is 0.04

Any of the 3 above are acceptable.

36. If I run a regression in Stata using the command:

reg wage educ

and run another regression using the command:

reg wage educ, r

What is the difference between the two regressions? (1 sentence)

The second asks for robust standard errors.

37. Robust standard errors _________ trustworthy for hypothesis testing when the assumption of homoskedasticity is true/valid.

¨ are

¨ are not