闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 369

SECOND SEMESTER, 2021

STATISTICS

Data Science Practice

1. (30 marks) Tree based models.

(a) For each of decision trees, random forests and boosted trees, explain what size of tree is used, why, and

how the optimal size is found. (6 marks)

(b) How do trees account for interactions? (2 marks)

(c) In boosted trees, what does the learning rate parameter a do, what range of values can it take, and what are the advantages of using lower or higher values? (4 marks)

(d) Describe how variable importance is measured in random forests and how it diﬀers from measures of the same in decision trees. (6 marks)

(e) Below is a summary of a random forest model for predicting vehicle engine size (CC RATING) from the

New Zealand Vehicles data set we saw in lectures.

(i) What does Mtry refer to and why is it an important parameter in random forests? (2 marks)

(ii) Using the example of an average car with engine size around 2500cc, explain in simple terms the

predictive accuracy of this model. (2 marks)

(iii) Overall, does it look like this model does a good job? Why or why not? (2 marks) Ranger result

Call:

ranger(CC_RATING ~ . , data = nzvehicles)

Type:	Regression
Number of trees:	500
Sample size:	81062
Number of independent variables:	12
Mtry:	3
Target node size:	5
Variable importance mode:	none
Splitrule:	variance
OOB prediction error (MSE):	122099 .7
R squared (OOB):	0 .8338549

(f) The output below is from printcp in rpart using the same New Zealand Vehicles data set as above.

(i) Based on this table, what is the size of the best model? (2 marks)

(ii) What is the MSPE for the best model? How does this compare to the random forest model

above? (2 marks)

(iii) What tells you that the tree is not large enough and what would you change in the call to get a

larger tree? (2 marks)

Regression tree:

rpart(formula = CC_RATING ~ . , data = nzvehicles, cp = 0 .01)

Root node error: 5 .9572e+10/81062 = 734889

81062

CP 0.287073 0.085442 0.067620 0.034359 0.033084 0.025222 0.024298 0.018150 0.015530 0.013491 0.010830 0.010000

nsplit

rel error

1.00000 0.71293 0.62748 0.55987 0.52551 0.49242 0.46720 0.44290 0.42475 0.40922 0.39573 0.38490

xerror

1.00003 0.71296 0.62915 0.56329 0.53026 0.49587 0.47702 0.44635 0.42823 0.41661 0.40327 0.39273

xstd

0.0056506

0.0051144

0.0039600

0.0037940

0.0037786

0.0037621

0.0037458

0.0037708

0.0036789

0.0036022

0.0035616

0.0034879

2. (20 marks) Your client tasks you to develop a computer vision application to classiﬁes New Zealand native tree species in a large dense forestry area. You are provided with a sample of 220 coloured photos taken by

a group of phytologists and conservation enthusiasts. You ﬁrst try a ‘convolutional neural network’ .

(a) Why is a convolutional neural network a good choice here? (4 marks)

(b) What is a ‘convolution’ operation in this context? (3 marks)

(d) The data size is too small to train the convolutional neural network to high accuracy. What other deep learning predictive modelling method that requires less training data would you consider? Brieﬂy describe how this alternative method works. (4 marks)

(e) Your client uses unmanned aerial drones to collect more data from the entire forest which includes

diﬃcult to reach areas (e.g. cliﬀ edges). You apply the same model to this larger data set, how would it impact the overall error rate? Justify your answer. (6 marks)

3. (18 marks) Given a particular value of c, suppose we estimate the regression coeﬃcients in a regression

model by minimising

j p p

-ui ) 8p ) 8j zij 、 subject to 18j 12 2 c

i11 j11 j11

(a) What is the regularisation method described here? (3 marks)

For the rest of this question, i.e. for each parts 3(b) - 3(e), indicate which of the statement i. through v. is correct, and brieﬂy justify each answer with 1-2 sentence(s).

(b) As we gradually increase c from 0, the training RSS will: (3 marks)

i. Remain the same.

ii Increase initially, and then start decreasing in an inverted U shape.

iii. Decrease initially, and then start increasing in a U shape.

iv. Steadily increase.

v. Steadily decrease.

(d) Repeat 3(b) for (squared) bias. (3 marks)

(e) Repeat 3(b) for variance. (3 marks)

(f) Repeat 3(b) for irreducible error. (3 marks)

4. (32 marks) Churn prediction is a common classiﬁcation use case for software-as-a-service (SaaS) companies. It models the likelihood of a customer canceling a subscription to a service (e.g. a monthly subscription to Netﬂix). To build a churn model, companies collect a large number of variables relating to diﬀerent aspects of customer. For instance,

❼ Customer features: basic demographic information such as gender, age, income level, and education

level.

❼ Usage features: numbers of users logged in, time spent, time since last login, most frequently performed

actions.

❼ Service features: number of service interactions made, most frequent questions raised, satisfaction scores

through customer survey.

❼ Channel features: device type, most frequent mode of engaging with the service.

In general, the longer or more often a customer uses a service, the less likely it is for them to churn from it. It is also common to align the prediction time frame with the subscription plan, e.g. if a company is oﬀering a monthly plan, they would expect you to run the churn prediction every month using the most recent quarter usage data.

Suppose a start-up SaaS company has a pool of 1100 customers and collect 320 variables for each customer. You design a classiﬁer that predict their monthly churn (Yes/No) with a negative predictive value (NPV) of 99.27% and a precision of 89.05%; and it is estimated that 2% of customers are real churners.

(a) Present a completed confusion matrix of the classiﬁer. (4 marks)

(b) What are the positive predictive value (PPV) and recall of the classiﬁer? Show your work-

ing. (3 marks)

(c) Why would you expect each of lasso regression, classiﬁcation tree, and neural network to perform well or badly on this problem? (15 marks)

(d) The churners are a small portion of the customer base. So your data contains very few observations of them. Consequently, your classiﬁer could generate a low probability even for a true churner. How would you make your classiﬁer more likely to pick up these churners? In doing so, what is the likely impact on the overall error rate; and why? (6 marks)

(e) Describe one aspect of the data and one aspect of the modeling that you need to consider changing or

incorporating if you want to shift focus from short-term (e.g. monthly) prediction to long-term (e.g. yearly) prediction. (4 marks)