Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMM1190: DATA, INSIGHTS, AND DECISION

TERM 2 2022

EXAMINATION PRACTICE QUESTIONS

QUESTION 1

You have been brought in as a Data Science consultant on a court case. A chemical company has been found negligent after a chemical spill at one of their plants. All that remains in the     court case is to decide on the extent of the damages for which the company is liable. One way the court has been deciding on this amount is to look at the impact the spill has had on the value of houses located near to the chemical plant where the spill occurred.

As the expert witness, you have been asked to evaluate some alternative strategies to         estimate the impact on housing prices (price). Strategy A involves taking a sample of sales  that occurred after the spill where the houses are classified as either being close to the plant or not. This feature was designated by a variable near that was equal to 1 if the house was  deemed to be close to the chemical plant and zero otherwise. Then a regression analysis is performed using the following model (MA):

MA: pricei  =  F0  +  F1neari  +  ui .

Strategy B involves taking a sample of sales for houses near to the plant but where some  sales occurred before the spill and some after.  The variable after is equal to 1 if the house was sold after the spill and zero if the sale was before. Then a regression analysis is performed using the following model (MB):

MB: pricei  =  F0  +  F1 afteri  +  ui .

Part A.

Explain A and B as strategies to estimate the impact of the chemical spill and critically evaluate each of them. Is either preferable to the other?

Part B.

Suggest an alternative regression model that is preferable to MA given that you only have data from after the spill. Does this address all your criticisms of Strategy A that you outlined in part  (a)?

Part C.

Using housing data models MA and MB are estimated, and the results given below. How do you interpret these results? (Note that pTice is expressed in $1000)

p—Tlce =  131.9  −  40.0neaT

(4.0)      (7.6)

n = 142, R2  = . 165, standaTd eTToTs in ( . )

p—Tlce = 63.7 +  28.3afteT

(5.9)       (9. 1)

n = 96, R2  = .094, standaTd eTToTs in ( . )

Part D.

Suppose you have sales both near and not near to the plant as well as sales before and after the spill. Suggest an alternative strategy to estimate the effect of the oil spill on housing prices that is preferable to both MA and MB?

Word Limit: 800 words for entire question (i.e., all subparts).

QUESTION 2

Imagine you work for a large department store, which highly values customer service. The following chart shows how customers contact the customer service centres.

 

You begin to discuss the chart with your manager. Immediately, she has the following queries: “I want to see the overall trends, but it is difficult to see with all the seasonal spikes in the time

series. I’d like a simpler view into the trend.” You decide to create some charts to address your manager’s queries.

Part A.

Using the four frameworks typology, identify the type of chart you would use to address the query and explain why.

Part B.

Sketch two alternative charts for the query. For each chart, provide a brief explanation of your design choices. To sketch the chart, you can use any tool you want (e.g., you can use a          software tool likeinfogram, excel, or R). Alternatively, you can sketch the chart using pencils,  pens or markers on paper, then take a picture of the charts and paste them into your solutions document. You can access the underlying data customer_service.xlsx” on Ed.

Part C.

Evaluate your two charts and explain which you would select to further develop to present to your manager.

Word Limit: 500 words for entire question (i.e., all subparts).

QUESTION 3

As part of a preventive health program, we are interested in building a model for predicting diabetes among women. We have access to the following information for each woman:

-       diabetes: Yes or No for diabetic .

-       npreg: number of pregnancies .

-       glu: plasma glucose concentration

-       bmi: body mass index

-       ped: diabetes pedigree function

-      age: age in years.

We have split the available data into a training dataset with 332 cases and a test dataset with 200 cases.

With reference to the above case, please answer the following questions:

Part A.

We have run a logistic regression to predict diabetes using npreg, glu, bmi, ped, and age. The R output from this logistic regression is as follows.

Based on the output, write down the mathematical equation of the logistic regression associated with this R output.

(2 marks)

 

Part B.

Based on the output in Q 3 Part A, provide an interpretation of the coefficients associated with bmi and npreg?

Part C.

Based on the output in Q 3 Part A, explain and justify the predictors that have a statistically significant relationship to the response?

QUESTION 4

As an alternative to the logistic regression in Q3, we have also fitted the classification tree below:

 

Part A.

Based on this decision tree, would you predict that a woman with the following characteristics has diabetes: npreg=0, glu=140, bmi=47.9, ped=0.259, age=26? Justify your answer.

Part B.

Provide TWO facts to demonstrate the consistency of results from the classification tree above with the findings reported in Q3 Part A.

QUESTION 5

You are a data analyst for AppCo. AppCo produces a smartphone app that allows users to      virtually try on clothes. It is funded by having sponsored links to online clothing retailers.          AppCo tells its users, “top brands, all sizes, best prices” . AppCo has a dashboard that is used by Board members that shows sales, revenues and a comparison of sales by retailer. You       have been asked to consider whether having a selection of the most popular sizes at a slightly lower price would increase revenue. The CEO has said, “well just use some of your magic A/B testing …” .

Part A.

Identify potential legal issues that may arise from A/B testing if AppCo users are unaware the experiment is taking place.

Part B.

Evaluate whether A/B testing would lead to any legal consumer issues?

Part C.

Recommend steps to ensure that your organisation maintains good governance when developing analytics at AppCo.

Word Limit: 500 words for entire question (i.e., all subparts).