Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MSIN0010 Data Analytics I Scenario Week

Scenario Week Group Project II: Thursday-Friday

Instructions

During the last two days of Scenario Week, you will be working with data from Dunnhumby, a customer data science company with headquarters in London. Dunnhumby serves clients   in the grocery retail, retail pharmacy, and retail financial services industries. They help retailers and brands “perfect the science of shopping” .

For Retailers: “We help retailers achieve sustainable growth and smarter operations with Customer First strategies and advanced Customer Data Science solutions. We ensure that every decision starts and ends with the Customer in mind – creating loyalty, ramping up    efficiency, driving sales, and positioning you to succeed.”

For Brands: “We help CPGs review performance, generate insights, and utilise them to

increase sales, transform media campaigns, and win market share. Within this, we position account teams to improve collaboration with retail partners, give trade marketers the guardrails for successful strategies, and help brand marketers activate against what Shoppers want across multiple channels.”

Your specific data come from an American grocery retailer client, Kroger. There are three separate data files.

.    Transaction data (transaction.csv): Price and sales information for the top five products from the mouthwash, pretzels, frozen pizza, and boxed cereal categories.

Data are provided at the product-store-week level. This data set is split into train.csv and test.csv, which will be used for one of the prediction tasks discussed below.

.    Product data (products.csv): Product characteristics, including the brand, the subcategory, and the category.

.    Store data (stores.csv): Store characteristics, including the city, the state, and store quality tier.

Analysis

Task 1 (Descriptive Analysis): Use the transactions.csv data set to create graphs and/or tables that provide insights into ANY TWO of the following SIX questions.

a)   How do prices vary over time? Do certain categories exhibit more seasonal changes in prices than others?

b)   How does demand vary over time? Do certain categories exhibit more seasonal changes in demand than others?

c)   Are prices for a given product the same across all stores? Or do they vary regionally?

Or by store type?

d)   Are promotions for a given product coordinated across all stores? Or do they vary regionally? Or by store type?

e)   Are prices in UPSCALE stores higher than prices in VALUE stores?

f)    In what city or state are PRIVATE LABEL goods most popular?

Task 2 (Measuring Price Elasticities): Use the transactions.csv data set to fit the following regression model (separately) for each category-city combination in the data.

log UNITS = a + β log PRICE + Y FEATURE + δ DISPLAY + E

.    In this model, the parameter β represents the % change in demand given a 1% increase in price. This is the definition of a price elasticity!

.    Find a concise way to report the estimates of β (beta), Y (gamma), and δ (delta).

.    In the discussion of your results be sure to comment on specific categories and or

cities that exhibit the most/least elastic demand, are most/least affected by feature advertising, and are most/least affected by display advertising. Also comment on whether the results appear reasonable or not.

Task 3 (Demand Forecasting): Use train.csv and test.csv to fit at least three different regression models with unit sales – i.e., UNITS not log UNITS – as the dependent variable.

Decide which predictor variables to include and use the data to choose the “best” model.

You must fit a linear regression model, a regression tree, and ANY ONE (or more) of the following: random forest, LASSO, or neural network.

Note: If you add columns to the training data set (i.e., predictor variables not from the transactions.csv data), you will need to also add those same columns to the test data set.

The Deliverables

1. 2000-word report due Monday February 27th, 10am

2. An R script submitted separately as a .R file

Assessment

The project report (with R code) will constitute 20% of your overall mark for this module.

Structuring the Report (% total mark, recommended word count)

1.   Executive Summary (10%, 200 words)

2.   Industry Background and Data (10%, 200 words)

3.   Descriptive Analysis (30%, 600 words)

4.   Price Elasticity Estimates (30%, 600 words)

5.   Demand Forecasting (20%, 400 words)

Appendices (if necessary)

Note: Your target audience should not be me! Identify key business stakeholders

(companies, managers, analysts, etc.) who would be interested in your analysis and write the report to them. I will be looking for good

Additional Resources

A library of charts in R: https://www.r-graph-gallery.com

Formatting dates

R library: lubridate (https://lubridate.tidyverse.org)

Regressions by group

R libraries: broom, purrr

https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html

Random forest

R library: randomForest

https://www.r-bloggers.com/how-to-implement-random-forests-in-r/

Lasso

R library: glmnet

https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html

Neural Networks

R library: neuralnet

https://www.r-bloggers.com/fitting-a-neural-network-in-r-neuralnet-package/