MPA 2455 - Statistics Problem Set #1 Summer 2024

发布时间：2024-06-25

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MPA 2455 - Statistics

Problem Set #1

Summer 2024

QUESTION 1 – COLLEGES AND INTERGENERATIONAL MOBILITY

Data recently released in 2017 fromOpportunity Insightsestimates the joint distribution of

parents’ and kids’ incomes for all colleges in the United States. This question asks you to explore these data for a college of your choice. You may download these data in “College Mobility.xls” .

(a) Choose a college to analyze, and report both the name and ID number (“super_opeid”). NOTE: You may work with your group on this question (as others), but you must choose your own college that is different from others in your group.

(b) Lay out a probability table that you could use to calculate various marginal, joint, and conditional probabilities in this setting. Note that you do not need to fill in the table.

1) P(Parents in Bottom 20%)

2) P(Kids in Top 20% | Parents in Bottom 20%)

3) P(Kids in Top 20% AND Parents in Bottom 20%)

4) P(Kids in Top 20% AND Parents in Top 20%)

5) P(Kids in Top 20%)

The paper analyzing these data focused on upward mobility from the bottom quintile to the top quintile, but there are many other potential definitions of mobility.

(d) Calculate P(Kids in Top 40% | Parents in Bottom 40%) for your chosen college. Be sure to show the formula you used for this calculation, in terms of the statistics provided in the dataset.

(e) Repeat your calculation in (d) for all colleges, and calculate the correlation between your new measure of upward mobility and P(Kids in Top 20% | Parents in Bottom 20%) across colleges.

(f) In a short paragraph (100 words), comment on the implications of your result in (e).

QUESTION 2 – COVID-19 INFECTION

Many epidemiologists believe that testing is the only approach to long-term virus control; even with the vaccine now available around the world, new (and potentially more lethal) variants may emerge against which some or all of the vaccines are ineffective. But what can one learn from these tests? We examine this issue in the context of Rhode Island. Suppose the prevalence of COVID-19 is about 5,000 cases out of 1,000,000 people in the state (cumulative who are still infected from the last few months, not per day). Suppose that the test for detecting COVID-19 yields a positive test for 99 percent of those actually infected, and it yields a negative test for 95 percent of those actually not infected. (These are approximate rates for rapid tests.)

(a) Use Bayes’ Rule to calculate the probability that a person who tests positive is infected (i.e. P(COVID | +)). This number is usually referred to as the positive predictive value of the test.

(b) Use Bayes’ Rule to calculate the probability that a person who tests negative is not infected (i.e. P(NO COVID | – )). This number is usually referred to as the negative predictive value of the test.

(c) Suppose that the total number of COVID cases in Rhode Island has fallen from current rates by September. How would the positive and negative predictive values of the test change?

(d) Read the following passage about testing accuracy issues in the context of HIV:

"Two weeks ago, a 3-year-old child in Winston Salem, North Carolina, was struck by a car and rushed to a nearby hospital. Because the child's skull had been broken and there was a blood spill, the hospital performed an HIV test. As the traumatized mother was sitting at her child's bedside, a doctor came in and told her the child was HIV-positive. Both parents are negative. The doctor told the mother that she needed to launch an investigation into her entire family and circle of friends because this child had been sexually abused. There was no other way, the doctor said, that the child could be positive. A few days later, the mother demanded a second test. It came back negative. The hospital held a press conference where a remarkable admission was made. In her effort to clear the hospital of any wrongdoing, a hospital spokesperson announced that 'these HIV tests are not reliable; a lot of factors can skew the tests, like fever or pregnancy. Everybody knows that.'"

Celia Farber, Impression Magazine, June 21, 1999. Reported by Christine Maggiore: Is the “AIDS test” Accurate? (http://healtoronto.com/testcm.html)

Write a short paragraph (100-200 words) to the hospital commenting on whether the claims made by the doctor and the hospital spokesperson were sound. The letter should be written in language that the head of the hospital (who is intelligent and educated, but not well- versed in statistics) can understand.

QUESTION 3 – HURRICANE PREPARATION

You are tasked by FEMA to estimate expected hurricane damage in the upcoming year.

The data file “StormData.xlsx” contains information on the 162 Atlantic hurricanes that have made landfall in the United States from 1900 – 2017.

Variables in the dataset include:

• BASE DAMAGE ($): Estimated damage from the storm in the United States in the year the storm made landfall.

• CURRENT DAMAGE ($ 2021): Estimated damage from the storm if the storm struck in 2021. Current damage adjusts base damage for inflation, as well as changes in

coastal population and changes in the value of coastal property.

• DAMAGE RANK: Ordinal ranking based on current damage.

• CATEGORY AT LANDFALL: Category on the Saffir-Simpson hurricane scale when the storm made landfall in the United States

(http://www.nhc.noaa.gov/aboutsshws.php). Note: “TS” denotes a hurricane that made landfall as a tropical storm.

• WINDS AT LANDFALL: Maximum sustained wind speed when the storm made landfall on the United States.

(a) Major hurricanes are classified as Category 3, 4 or 5 hurricanes and are considered especially dangerous. How much damage should we expect if a major hurricane makes landfall in the U.S.? (Note this includes all damage done by a given storm, including 2nd landings.)

(b) If we use the hurricanes from 1900-2017 as a guide, how many hurricanes should we expect in the coming year? How much damage should we expect nationwide? (Hint: first create a table that sums the number of hurricanes and hurricane damage by year)

(c) Using the past as a guide, plot the distribution of hurricane damage that might hit in the upcoming seasons. You should do this separately for each of the three regions.

QUESTION 4 – Airbag Risk

You are an analyst in the risk modelling department at Tesla. The director of production recently identified a serious flaw in their manufacturing processes that has resulted in cars shipping with faulty airbags over the last month.

During the last month, the factory produced only two types of cars: the Model S (40% of all cars shipped) and the Model Y (60% of all cars shipped). The director of production believes that only a quarter of all cars shipped had a faulty airbag installed, but that the specific nature of the assembly process implies that whether a faulty airbag appears in a Model S or Model Y car is completely random.

(a) Create a two-by-two table that describes the joint (and marginal) probability distribution of Tesla car models and faulty airbags.

(b) It just so happens that Prof Bruhn bought a Model S last month. What is the probability he has a faulty airbag?

QUESTION 5 – European Parliament

At the end of May 2019, voters across Europe selected 751 Members of the European Parliaments (MEPs) to serve a 5-year term. Although there are many different specific parties, the MEPs exist in several political blocs. Consider the following table showing the distribution of MEPs by political bloc and region of origin.

	Conservative	Social Dem.	Liberal	Populist	Other
Northwest	112	131	76	58	5
South	74	65	10	58	9
East	93	34	19	0	7

(a) What is the probability that an MEP hails from Easter Europe and identifies with the Social Dem. block?

(b) What is the probability that a populist MEP comes from the Northwest?

(d) Suppose you call 4 random MEPs from the liberal block. What is the probability at least one of them is from the South?