闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DATA 100

Spring 2021

Final-Exam

1. (7.0 points)

(a) (2.0 pt) Recall the tips dataset that we worked with on assignments in the past, which includes data about the tip on a restaurant bill as well as the day of week and the sex of the individual. The plot below attempts to examine patterns between the tip as a percentage of the bill and the sex of the individual by the day of week (DOW)

Select the best reason below for why the data visualization is misleading or poorly constructed.

0 the y-axis should be log transformed

● the clustering of bars doesn’t allow a key comparison to be made

0 the plot suﬀers from overplotting

0 the bars for each day of week should be stacked on top of each other (e.g. the bar for “Thur” would have a total height of approximately 0.3)

(b) (2.0 pt) Consider the surface whose contour plot is provided below.

gradient ﬁelds most likely corresponds to the surface shown above?

A gradient ﬁeld is a plot that shows the direction and relative magnitude of the gradient of a surface on a 2-dimensional plot where each point has a vector pointing from it in the direction of the gradient at that point and the length of that vector is proportional to the magnitude of the gradient.

We have read in some data as the dataframe df. Consider a subset of df below, which contains some information on the background of various individuals in the US.

i. (2.0 pt) Suppose we want to observe the relationship between and the distributions of the AFQT (an intelligence metric, with units percentile) and log_earn_1999 (log of the individual’s earnings in 1999) variables based on whether the individual’s parents both went to college. Select the line of code below that generates the best plot to observe this relationship.

● A:

sns .kdeplot(x=df[ !AFQT!], y=df[ !log_earn_1999!],

hue=df[ !mother_college !] & df[ !father_college !])

● B:

sns .scatterplot(x=df[ !AFQT!], y=df[ !log_earn_1999!],

hue=df[ !mother_college !] & df[ !father_college !])

● C:

sns .lineplot(x=df[ !AFQT!], y=df[ !log_earn_1999!],

hue=df[ !mother_college !] & df[ !father_college !])

● D:

sns .kdeplot(x= !AFQT ! , y= !log_earn_1999 ! , hue=[ !mother_college ! ,

!father_college !], data=df)

● E:

sns .scatterplot(x= !AFQT ! , y= !log_earn_1999 ! ,hue=[ !mother_college ! , !father_college !], data=df)

● F:

sns .lineplot(x= !AFQT ! , y= !log_earn_1999 ! , hue=[ !mother_college ! ,

!father_college !], data=df)

Hint: Consider overplotting.

● A

0 B

0 C

0 D

0 E

ii. (1.0 pt) Suppose we want to understand the relationship between weeks_worked_1999 and the sex of the individual. We run the following code to generate a plot:

df2 = df .groupby("zip_code") .mean() .reset_index()

sns .lineplot("zip_code", "log_earn_1999", data=df2)

Select the reason below for why this plot would represent a bad data visualization.

● treats a categorical variable as a continuous variable

0 treats a continuous variable as a categorical variable

0 represents a density with a feature other than area

0 does not show the relationship between the variables of interest

2. (9.0 points) (a) (4.0 points)

Recall that a random forest is created from a number of decision trees, with each decision tree created from a bootstrapped version of the original training set. One hyperparameter of a random forest is the number of decision trees we train to create the random forest.

Deﬁne T to be the number of decision trees used to create the random forest. Let’s say we have two candidate values for T: var1 and var2. We want to perform var3 - fold cross-validation to determine the optimal value of T. Assume var1, var2, and var3 are integers.

i. (2.0 pt) In this cross-validation process, how many random forests will we train? Your answer should be in terms of var1, var2, and/or var3 and should be an integer.

2 * var3

ii. (2.0 pt) In this cross-validation process, how many decision trees will we train? Your answer should be in terms of var1, var2, and/or var3 and should be an integer.

(var1 + var2) * var3

(b) (2.0 pt) Let’s say we pick three hyperparameters to tune with cross-validation. We have 5 candidate values

for hyperparameter 1, 6 candidate values for hyperparameter 2, and 7 candidate values for hyperparameter 3. We perform 4-fold cross validation to ﬁnd the optimal combination of hyperparameters, across all possible combinations.

In this cross-validation process, how many random forests will we train? Your answer can be left as a product of multiple integers, e.g. “1 * 2 * 3”, or simpliﬁed to a single integer, e.g. “6” . (These are not the correct answers to the problem).

4 * 5 * 6 * 7 = 840

(c) (3.0 pt) Here is some code that attempts to implement the cross-validation procedure described above. However, it is buggy. In one sentence, describe the bug below.

You may assume the following:

● X_train is a pd .DataFrame that contains our design matrix, and Y_train is a pd .Series that contains our response variable, both for the full training set.

● Assume ensemble .RandomForestClassifier(**args) creates a random forest with the appropriate hyperparameter values. The bug is not on this line.

● The candidate values for each hyperparameter have been loaded into the lists cands1, cands2, and cands3, respectively.

1: from sklearn .model_selection import KFold

2: from sklearn import ensemble

3: import numpy as np

4: import pandas as pd

6: kf = KFold(n_splits = 4)

7: cv_scores = []

8: for cand1 in cands1:

9: for cand2 in cands2:

10: for cand3 in cands3:

11: validation_accuracies = []

12: for train_idx, valid_idx in kf .split(X_train):

13: split_X_train, split_X_valid = X_train .iloc[train_idx], X_train .iloc[valid_idx] 14: split_Y_train, split_Y_valid = Y_train .iloc[train_idx], Y_train .iloc[valid_idx]

16: model = ensemble .RandomForestClassifier(**args)

17: model.fit (X_train, Y_train)

18: accuracy = np .mean(model .predict(split_X_valid) == split_Y_valid) 19: validation_accuracies .append(accuracy)

20: cv_scores .append(np .mean(validation_accuracies))

Each iteration of the algorithm trains a random forest on the entire training set, as opposed to the part of the training set that is not reserved for validation.

3. (14.0 points)

We are trying to train a decision tree for a classiﬁcation task where 0 is the negative class and 1 is the positive class. We are given 8 data points each in pairs of (x1 , x2 ) features.

(a) (3.0 pt)

x1 x2 y
3 4 1
2 1 0
1 3 1
5 9 0
9 6 1
7 2 1
4 7 0
8 8 1

What is the entropy at the root of the tree? Round to 4 decimal places.

/(log2 + log2 ( )) = 0.6616

(b) (2.0 pt) What is the gini inpurity at the root of the tree? Note that the formula for gini impurity is 1 /(i(4)=1 pi(2) where pi is the fraction of items labelled with class i and c is the total number of classes.

1 / (( )2 + ( )2 ) = 0.46875

(c) (4.0 pt) Suppose we decide to split the root node with the rule xi 2 β where i = 1 or 2. Which of the following minimizes the weighted entropy of the two resulting child nodes.

● x1 2 6

0 x1 2 3.5

0 x2 2 5

0 x2 2 3.5

0 x2 2 6.5

(d) (2.0 points)

We have decided to create a food recommendation system using a decision tree! We would like to run our decision tree to see what food it recommends in certain scenarios.

If you have trouble reading the above tree, please go to this link: https://i.imgur.com/9Z40cYP.png

i. (1.0 pt) Bob wants to eat some unhealthy food, speciﬁcally at a fast food restaurant. When asked what he’s in the mood for, he replies with “Mediterranean” . Which of the following restaurants could the decision tree recommend for Bob?

0 Chipotle

0 Taco Bell

● Dyars Cuisine

0 IBs Burgers

ii. (1.0 pt) Larry would like to eat some unhealthy food as well! However, he got a salary bonus from his job so he does not want to eat at a fast food restaurant. When asked how much he would like to pay, he replies with “I have no preference” . Which of the following restaurants could the decision tree recommend for Larry?

■ Olive Garden

■ Cheesecake Factory 口 Super Dupers Burger

■ Flemings Prime Steakhouse

(e) (3.0 pt) Joey and Andrew are each training their own decision tree for a classiﬁcation task. Joey decides

to limit the depth of his decision tree to depth 3 while Andrew decides to not set a limit on the depth of his decision tree. When plotting the training error, Joey’s error seems to be much higher than Andrew’s error. However, when plotting the validation error, Andrew’s error seems be much higher than his training error as well as Joey’s error. Andrew is confused and surmises that there must be a bug in his code that is causing this to happen. What happened? Explain. What can he do to improve it? Name at least 3 things he can do to improve his error. Please limit your repsonse to 2 sentences per reason.

He is not correct. Andrew’s high validation error and low training error is due to overﬁtting. Joey did not run into this error because he limited his depth to 3. For Andrew to improve his validation error, he should try to limit the depth of his tree, try pruning his decision tree, preventing splits that have less than 1% of the samples, or using Random forests.

4. (16.0 points)

(a) (3.0 pt) Suppose we are modeling the number of calls to MangoBot food delivery service per minute. We believe that there are likely more calls around lunch time.

Which of the following feature encodings of the time of day (0.0 to 24.0, exclusive of both ends) would capture this assumption? Select all that apply.

口 time_of_day ** 2

口 np .log(12 * time_of_day)

■ 1-np .cos(np .pi * time_of_day / 12)

■ np .exp(- (time_of_day - 12) ** 2)

(b) (4.0 pt) Recall that in a binary classiﬁcation task, we want our data to become linearly separable so that we can maximize the performance of our classiﬁer. In many cases, however, our data are not directly linearly separable. As a result, we want to apply some transformation to our data so they will become linearly separable afterwards.

For the following dataset, select all transformations that can make the data linearly separable.

■ (x1 , x2 ) → (x1(2), x2 ) 口 (x1 , x2 ) → (x1 , x2(2))

■ (x1 , x2 ) → (x1(2), x2(2))

■ (x1 , x2 ) → (x2(2), x1(2))

(c) (3.0 pt) One way to transform textual data into features is to count the frequencies for all of the words in the text.

Consider the following preprocessing steps:

i. Remove all punctuations ( ., ,, :, . . . ).

ii. Remove all stopwords (did, the, . . . ). Note that stopwords do not include words that negate things such as no, not, . . .

iii. Lower case the sentence, and keep words that only consist of letters a / z .

iv. Encode the sentence as a vector containing the frequencies for all the unique words in the text.

Suppose we use the frequency vector from the steps above as our feature to train a logistic regression model that predicts the sentiment of a sentence (positive, negative). In 1-2 sentences, describe a case where our model would fail and make a false prediction.

Your answer must be speciﬁc to the preprocessing steps and includes an example sentence to earn credits.