闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

BUSI3122-E1

INTRODUCTION TO DATA SCIENCE: BIG DATA ANALYSIS IN BUSINESS

1. You are on an interview where they notice that you’ve taken a data science class.

a) They ask you about what you have learned there. Besides talking about modelling stuff, you want to give a bigger picture. Explain why it is important to think about data science projects strategically, with respect to making internal investments. What sort of investments might you have to make? (50/100)

b) Now they are interested and ask you if you believe a firm can achieve sustained competitive advantage from data science. Of course you start with “Well, it depends...”, then you want to go on. Give 5 reasons why data science may indeed give sustained competitive advantage, even though the basic data science technologies are easily acquired/replicated. Be as precise as possible. (50/100)

2. A student was trying to figure out ‘what factors may motivate a person to get personal bank loan and how to classify this group of people?’. In the report, the student used a simple dataset containing THREE variables: marital status, residence situation and loan status (whether the customer has a loan or not?). A summary of the dataset is presented below.

Marital Status	Housing	Loan	No. of Customer
Single	Yes	Yes	10
Divorced	Yes	Yes	100
Married	Yes	Yes	100
Single	Yes	No	500
Divorced	Yes	No	300
Married	Yes	No	1000
Single	No	Yes	50
Divorced	No	Yes	50
Married	No	Yes	50
Single	No	No	340
Divorced	No	No	100
Married	No	No	800

The student was trying to build a classification tree. Please help the student do the followings:

a) Identify the target variable and conduct the tree induction process through information gain. You are also supposed to present the final tree model. (40/100)

b) Assuming we are going to make prediction based on the majority of each leaf, draw the confusion matrix for this result. (10/100)

c) In this context, explain the procedure to prevent overfitting. (30/100)

d) How are you going to use this model in practice, e.g. for banks? (20/100)

3. I would like to see whether my customers tend to cluster in understandable groups.

a) What precisely is the difference between segmenting my customers using clustering and segmenting my customers using tree induction, and when would I use one rather than the other? What is the practical difference— i.e., in the data mining process where and what would the main differences be? (30/100)

b) Describe what is the important concept that I need in order to apply clustering to my customer data. (10/100)

c) Considering that you answered part (b), what might still be unclear for you before running the k-means algorithm? (10/100)

d) Describe another type of clustering method that I should consider. How is it different from k-means? (20/100)

e) Once I get clusters, it is important for me to see if I can understand the meaning of the clusters. Describe three ways to help understand the resultant clustering (as discussed in the book). (30/100)

4. Online retailer ABC is a Taobao seller, and it has an extensive dataset on its customers, including gender, age, visiting history, demographic information by zip code, and past purchases. The company is planning to launch a few new products and send an invitation to some of the customers (targeting cost is $2 per individual) and has unconstraint budget for constructing targeting models and running experiments under the following assumptions:

Customers spending may vary.

Customers may spontaneously make a purchase (even when not targeted).

Targeting cost is fixed (TarC= $2).

Other than the targeting cost, there are no additional costs for customers who are targeted and decide not to buy.

You have been asked to build several data mining models that would suggest which alumni should be targeted. Use the expected value framework to determine which models should be used to address the problem.

Note: It is sufficient to write down the correct expected value equations to identify the models that should be constructed. You need to consider different situations and build individual models for different situations.