Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 306

Midterm Practice Questions

True/False and Multiple Choice

More than one answer may be possible.

1. T/F: A categorical variable assumes one of a few discrete values, and is usually saved in R as a factor or character vector.

2. T/F: If df is a tibble with a column called a, the following two commands produce the same output:

•  select(df,  a)

df$a

3. The command df  %>%  filter(x  >  100,  y  <  20) is equivalent to which of the following:

a.  filter(df,  x  >  100,  y  <  20)

b.  filter(df,  x  >  100  &  y  <  20)

c.  filter(df,  x  >  100)  %>%  filter(df,  y  <  20)

d.  filter(df,  x  >  100)  &  filter(df,  y  <  20)

e.  df  %>%  filter(x  >  100)  %>%  filter(y  <  20)

4. I wish to sort df in descending order of column a, with ties broken by column b sorted in ascending order. Which of these commands would work?

a.  df  %>%  arrange(-a,  b)

b.  df  %>%  arrange(b,  desc(a))

c.  arrange(df,  -a,  -b)

d.  arrange(df,  desc(a),  b)

e.  None of these.

5.  I wish to drop all columns in df except for column a. Which of these commands would work?

a.  df  %>%  select(a)

b.  select(df,  -a)

c.  filter(df,  -a)

d.  filter(df,  a)

e. None of these.

Problems

1. The gss_cat dataset contains survey information from the General Social Survey. I used this dataset to create the following visualization showing the number of married versus non-married survey respondents:

Married vs. nonmarried respondents in GSS

11500

11000

10500

10000

FALSE                                             TRUE

marital == "Married"

Source: General Social Survey

a. In your opinion, is this an honest plot? Why or why not?

b. Generate a plot that more accurately depicts the number of married and non-married respondents in this survey.

2. Recreate the following plot which shows the how many miles were flown (across all departures) out of NYC-area airports by the top five most popular models of airplane:

 

Distances flown by different models of airplane (Originating from JFK, LGA, or EWR)

 

EMB−145LR       737−832          757−222          737−824         A320−232

Aircraft model

3. The following table shows the proportion of flights which were delayed (either departure or arrival) by more than 60 minutes each month. Fill in the missing entries by providing code that generates the complete table.

month

p_delayed

Jan

0.0795894

Feb

Mar

Apr

May

0.0940700

Jun

Jul

0.1625844

Aug

Sep

0.0567197

Oct

Nov

Dec

0.1164693

4. Recall the a2weather dataset that we studied in lectures 4 and 5:      load(url("https://datasets.stats306.org/a2weather.RData "))

a) Here is a plot of the daily maximum temperature in Ann Arbor:

Daily maximum temperature in Ann Arbor, 1890−presen

 

2000

Suppose you are a climate researcher studying long-term warming trends in the Ann Arbor climate. Do you think this is a useful plot? Why or why not?

b) Produce a plot that more effectively depicts the long-term trend.

5. The storms table contains hourly information on different tropical storms and hurricanes that have hit the United States in the past ~50 years. For example, here is the information on Hurricane Sandy     filter(storms,  year  ==  2012 ,  name  ==  "Sandy")  %>%  print(n=15)

##  #  A  tibble:  33

status

<chr>

tropical

tropical

tropical

tropical

tropical

tropical

tropical

tropical

hurricane

hurricane

hurricane

hurricane

hurricane

hurricane

hurricane

##  #  . . .  with  18  more  rows,  2  more  variables:  tropicalstorm_force_diameter  <int>,  ##  #      hurricane_force_diameter  <int>,  and  abbreviated  variable  names  1:  category,

##  #      2:  pressure

The table shows that Sandy reached peak intensity between 5-6am on October 25, when it became a Category 3 hurricane. Overall, the number of hurricanes in 2012 which reached a peak intensity of Category 3 was two; the other one was Hurricane Michael.

Complete the table shown below, which records the number of Category 1–5 hurricanes in each decade, where each hurricane is categorized according to its peak intensity:

decade

1

2

3

4

5

1970-1980

8

6

2

1

2

1980-1990

6

1990-2000

9

2

2000-2010

9

14

2010-2020

11

5