STATS 306 Midterm Practice Questions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 306
Midterm Practice Questions
True/False and Multiple Choice
More than one answer may be possible.
1. T/F: A categorical variable assumes one of a few discrete values, and is usually saved in R as a factor or character vector.
2. T/F: If df is a tibble with a column called a, the following two commands produce the same output:
• select(df, a)
• df$a
3. The command df %>% filter(x > 100, y < 20) is equivalent to which of the following:
a. filter(df, x > 100, y < 20)
b. filter(df, x > 100 & y < 20)
c. filter(df, x > 100) %>% filter(df, y < 20)
d. filter(df, x > 100) & filter(df, y < 20)
e. df %>% filter(x > 100) %>% filter(y < 20)
4. I wish to sort df in descending order of column a, with ties broken by column b sorted in ascending order. Which of these commands would work?
a. df %>% arrange(-a, b)
b. df %>% arrange(b, desc(a))
c. arrange(df, -a, -b)
d. arrange(df, desc(a), b)
e. None of these.
5. I wish to drop all columns in df except for column a. Which of these commands would work?
a. df %>% select(a)
b. select(df, -a)
c. filter(df, -a)
d. filter(df, a)
e. None of these.
Problems
1. The gss_cat dataset contains survey information from the General Social Survey. I used this dataset to create the following visualization showing the number of married versus non-married survey respondents:
Married vs. non−married respondents in GSS 11500 11000 10500 10000 FALSE TRUE marital == "Married" Source: General Social Survey |
a. In your opinion, is this an honest plot? Why or why not?
b. Generate a plot that more accurately depicts the number of married and non-married respondents in this survey.
2. Recreate the following plot which shows the how many miles were flown (across all departures) out of NYC-area airports by the top five most popular models of airplane:
Distances flown by different models of airplane (Originating from JFK, LGA, or EWR)
EMB−145LR 737−832 757−222 737−824 A320−232 Aircraft model |
3. The following table shows the proportion of flights which were delayed (either departure or arrival) by more than 60 minutes each month. Fill in the missing entries by providing code that generates the complete table.
month |
p_delayed |
Jan |
0.0795894 |
Feb |
— |
Mar |
— |
Apr |
— |
May |
0.0940700 |
Jun |
— |
Jul |
0.1625844 |
Aug |
— |
Sep |
0.0567197 |
Oct |
— |
Nov |
— |
Dec |
0.1164693 |
4. Recall the a2weather dataset that we studied in lectures 4 and 5: load(url("https://datasets.stats306.org/a2weather.RData "))
a) Here is a plot of the daily maximum temperature in Ann Arbor:
Daily maximum temperature in Ann Arbor, 1890−presen
2000 |
Suppose you are a climate researcher studying long-term warming trends in the Ann Arbor climate. Do you think this is a useful plot? Why or why not?
b) Produce a plot that more effectively depicts the long-term trend.
5. The storms table contains hourly information on different tropical storms and hurricanes that have hit the United States in the past ~50 years. For example, here is the information on Hurricane Sandy filter(storms, year == 2012 , name == "Sandy") %>% print(n=15)
## # A tibble: 33
status
<chr>
tropical
tropical
tropical
tropical
tropical
tropical
tropical
tropical
hurricane
hurricane
hurricane
hurricane
hurricane
hurricane
hurricane
## # . . . with 18 more rows, 2 more variables: tropicalstorm_force_diameter <int>, ## # hurricane_force_diameter <int>, and abbreviated variable names 1: category,
## # 2: pressure
The table shows that Sandy reached peak intensity between 5-6am on October 25, when it became a Category 3 hurricane. Overall, the number of hurricanes in 2012 which reached a peak intensity of Category 3 was two; the other one was Hurricane Michael.
Complete the table shown below, which records the number of Category 1–5 hurricanes in each decade, where each hurricane is categorized according to its peak intensity:
decade |
1 |
2 |
3 |
4 |
5 |
1970-1980 |
8 |
6 |
2 |
1 |
2 |
1980-1990 |
— |
— |
— |
6 |
— |
1990-2000 |
— |
— |
9 |
— |
2 |
2000-2010 |
— |
9 |
— |
14 |
— |
2010-2020 |
— |
11 |
— |
— |
5 |
2023-02-18