闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 20: Final

Fall 2021

Multiple Choice Questions

Circle the best choice.

1. Consider a random variable X with an unknown distribution from which we have observed 20 draws: x1 , x2 , . . . , x20 , which constitutes our sample of data. The distribution of the sample is shown below using a bar chart.

4 5 6 7 8

Which of the following is the most likely distribution of the random variable X .

(a) X ~ Bern(p = .5)

(b) X ~ Bern(p = .6)

(d) X ~ Binom(n = 6, p = .5)

(e) X ~ Normal(µ = 0, σ = 1)

(f) X ~ Normal(µ = 6, σ = 1)

2. What is the median of this sample?

(a) 5

(b) 5.5

(d) 6.5

(e) 7

3. We expect the mean of this sample to be the median (circle the answer that goes in the blank).

(a) less than

(b) equal to

(d) (cannot tell based on the information given)

4. Select the histogram that shows the distribution of the standardized form of the same sample from the previous page, z1 , z2 , . . . , z20 where each zi =

(a) A

(b) B

5. Select the correct matching between each of the three distributions represented as a histogram and as a boxplot.

50 55 60 65 70

0 25 50 75 100

0 2 4 6 8

(1)

(2)

100

(a) a-1, b-2, c-3

(b) a-2, b-1, c-3

(d) a-1, b-3, c-2

(e) a-2, b-3, c-1

(f) a-3, b-1, c-2

6. Which of the three histograms above exhibits the greatest right skew?

(a) a

(b) b

7. A survey of 45 Cal seniors asked about their sleeping habits revealed a sample median hours of sleep per night of 6.5. Which method can be used to form a conﬁdence interval for that median?

(a) the t distribution

(b) the binomial distribution

(d) permuting the sample

(e) taking draws from the population under the null hypothesis

8. The 95% conﬁdence interval for the median hours of sleep per night was (5 .5, 7.4). If you had reduced the conﬁdence level to be 80%, you would expect the width of the resulting interval to:

(a) increase

(b) stay the same

9. The 95% conﬁdence interval for the median hours of sleep per night was (5 .5, 7.4). If the survey had instead had a sample size of 90, you would expect the width of the resulting interval to:

(a) increase

(b) stay the same

10. A multiple least squares regression model is inappropriate when

(a) you have a mixture of numerical and categorical explanatory variables.

(b) you want to make predictions for one variable based on information about another vari-

able.

(d) there is non-constant variance in the residual plot.

11. Which setting is most appropriate for using a logistic regression model?

(a) Predicting the number of years that it takes a student to graduate Cal.

(b) Describing the relationship between season performance and whether the Cal football

team will defeat Stanford.

(d) Predicting the lifespan of penguins near the Palmer Research Station in Antarctic.

12. Which of the following is FALSE regarding logistic regression?

(a) There is no assumption of normally distributed errors.

(b) It only is used in circumstances where your explanatory/independent/predictor variables

are two-level categorical variables.

(d) Geometrically, the intercept determines the left-right shift of the s-curve on a scatterplot of the data.

13. Which is the most accurate statement regarding a logistic regression model that results in estimates b0 = 0.789 and b1 = -2.38, both of which are statistically signiﬁcant?

(a) Increasing values of x are associated with a lower probability of y = 1. (b) The predicted value for y when x = 0 is 0.789 (using a decision threshold of .5).

(d) These estimates are consistent with the null hypotheses that their true values are zero.

14. Ra(2)dj and AIC are scores that you can calculate for a given multiple least squares regression model and a logistic model, respectively. They are both constructed to have what character- istic?

(a) Both scores will improve as you increase the complexity of the model.

(b) Both scores will improve as you decrease the complexity of the model.

(d) Both scores improve as a model more closely describes the data and improve as the model complexity grows.

15. Electric Cars: The model shown below was ﬁt to a sample of electric cars on which researchers observed the age, eﬃciency (measured by miles per gallon), and brand (A or B).

Consider the model

m一pg = b0 + b1 × age + b2 × brandB + b3 × age × brandB ,

where brandB is a dummy variable that takes on the value 1 if the car is of brand B, and 0 if it is of brandA.

brandB 0 1

0 5 10 15

age

(a) What will be the sign of the estimated coeﬃcient on age?

i. Negative

ii. Zero

iii. Positive

iv. No way to tell, even roughly, from the information given

(b) What will be the sign of the estimated coeﬃcient on brandB?

i. Negative

ii. Zero

iii. Positive

iv. No way to tell, even roughly, from the information given

i. Negative

ii. Zero

iii. Positive

iv. There is no interaction coeﬃcient

v. No way to tell, even roughly, from the information given

Diamonds, models: The following scatter plot shows data on a sample 300 diamonds drawn from the records of a jeweler in New York City in 2011. It displays the relationship between their size (measured in carats), the quality of their cut (measured from a low of 1 to a high of 5) and their price.

Consider two linear models for the price of diamonds, Model 1 (m1) and Model 2 (m2), detailed below.

m1 <- lm(price ~ quality, data = diamonds)

coef(summary(m1))

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 8.0795838 0.19169945 42.147141 1.379494e-127

## quality -0.1130969 0.04800494 -2.355943 1.912332e-02

m2 <- lm(price ~ quality + carat, data = diamonds)

coef(summary(m2))

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 4.32492877 0.08133782 53.172419 8.320685e-154

## quality 0.03193982 0.01338683 2.385914 1.766320e-02

## carat 3.87467531 0.06402920 60.514191 4.420086e-169

(a) Suppose we aim to use these models and sample of data to make inferences on a popula-

tion. In 1 - 2 sentences, describe the characteristics of a population on which it would be sensible to make inferences.

(b) Please construct a 95% conﬁdence interval for the coeﬃcient associated with cut quality

under Model 1 (though it is plotted here as a discrete variable, cut is ordinal, and therefore can have a single slope). Please use the actual numbers in the construction, but there is no need to go through the arithmetic to get the ﬁnal lower and upper bounds of the interval.

(c) Deﬁne as LB and UB the lower and upper bounds, respectively, of the interval that you calculated from this sample of data in part (b). Which of the following interpretations is most appropriate? (circle one)

i. We’re 95% conﬁdent that the parameter associating cut quality with the price of a diamond is between LB and UB.

ii. We’re 95% conﬁdent that the estimated coeﬃcient associating cut quality with the price of a diamond is between LB and UB.

iii. There is a 95% probability that the parameter associating cut quality with the price of a diamond is between LB and UB.

iv. There is a 95% probability that the estimated coeﬃcient associating cut quality with the price of a diamond is between LB and UB.

(d) Why do you think that the slope for cut quality is negative in Model 1 and positive in Model 2? Answer in 1 - 2 sentences.

(e) Based on the scatterplot, do you have any reservations about ﬁtting a linear model to

this data? Answer in 1 sentence.

(f) Challenging: Once you have a model that you’re conﬁdent does a good job of predicting

diamond prices, you plan to use it in a small business operation of buying and reselling diamonds. Say that diamonds come on the market with their quality of cut, carat, and asking price all provided. Which diamonds would you buy? At what price would you resell them?

17. Diamonds, data: The following questions concern the data visualized in the scatterplot in the previous question.

(a) Provide a sketch below of what the ﬁrst 5 rows of the diamonds data frame could look

like, being sure to convey the unit of observation, the names of each of the variables, and plausible values that they could take.

(b) Suppose you want to create a second data frame that considers only small diamonds (less

than one carat) and records the average price of the diamonds in each level of quality (1 to 5). Describe step-by-step how you can get to this data frame from the original data and include a sketch of the ﬁrst few rows of what this data frame might look like. You can either provide dplyr code (perfect syntax not required) or write in english using the relevant function and column names.

How to get there Sketch of resulting data frame