Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

HW 3

Guidelines and Submission Instructions:

●   Whether you work on the assignment individually or as a team, you need to submit your Python  notebook containing the answers to all questions. If you're working in a team, you can submit the same file, and it's allowed to have an identical notebook. Please provide the names of all your  teammates in the first notebook code cell.

●   To answer certain questions, you will write code, interpret results, or perform other tasks. All answers must be included in your Python notebook.

Case 2: Promotional activity on weekly sales

Your data science consulting firm has been hired to analyze the impact of promotional activity on sales. The company's accounting department has provided you with weekly sales data in the file sales.txt and the marketing manager has provided you with three additional files, promo1.txt, promo2.txt, and promo3.txt, containing information about three different types of promotions that they have been conducting. Note that each of these promo files contains more data points than the sales.txt file because the company has information about forthcoming (planned) promotional activity for the next 13 weeks. We define the future forecast horizon (h) as the length of this additional information.

Data: https://github.com/robertasgabrys/DSO424SPRING2023

1.  Import the data from all four files directly from my GitHub using Python.

2.  Display the promotional activity information graphically on the sales graph.

3.  Qualitatively assess the uplift in sales given specific types of promotion by examining the graph. Do different types of promotions have different impacts on sales?

4.  Build a regression model to predict weekly sales based on promotional activity, i.e. build a model to predict weekly sales from three promo’s.

5.  Are all three types of promotions effective? What are their effects on sales?

6.  Plot the sales, overlay the fitted values, and also display the promotional activity information graphically on the sales graph.

7.  You have an intern from DSO 424 who is assisting you with this project. During a meeting to discuss the initial model results, the intern shared their insights:

The graph created in the previous question highlights a significant negative lagging effect from the promotional activity. The sales significantly decrease in the periods following a promotion, and it is clear that our model has not captured this phenomenon.

The intern suggested using lags of our independent variables, i.e., the three promotion variables we used to build our first model. After investigating what the intern suggested, you decided to     entertain their idea and build a new model with promos and their lags.

8.  Provide R^2 values for the previous and current models. Has the R^2 of the new model increased significantly? Briefly explain how you assess this.

9.  Plot the sales, overlay the fitted values, and display the promotional activity information graphically on the sales graph. Do you think this new model is better? Briefly describe.

10. You got excited and decided to include a one-period lead of each type of promotion. Build a model that has promos, lag, and lead of each promo.

11. Provide R^2 values for the previous and current models. Has the R^2 of the new model increased significantly? Briefly explain how you assess this.

12. Plot the sales, overlay the fitted values, and display the promotional activity information graphically on the sales graph. Do you think this new model is better? Briefly describe.

13. Calculate the RMSE, MAPE and sMAPE for all three models.

14. Using the champion model selected using MAPE, generate future forecasts and provide numerical forecasts along with 95% confidence intervals for each model.

15. On one graph, plot the data and overlay the forecasts of the three models.

16. Before formulating your final recommendation on which model the company should use, ask      your intern to check if log transformation would help achieve better results. Build the same three

models to predict log sales. When generating forecasts, undo the log transformation. Provide the RMSE, MAPE, and sMAPE for each model.

17. On the graph plot data and overlay two champion model( use MAPE for Q22 and Q25 to determine the champions).

18. Out of all the models you considered, which one will you recommend the company use to predict future sales? Briefly explain your reasoning.

Your eager intern was all like "let's create more lags, woohoo!" But alas, the deadline for Project 1     was approaching faster than a cheetah on a sugar high, and there was simply no time to try out more models. Maybe next time we'll have a chance to indulge the intern's insatiable hunger for more lags.

Case 2: ProLobsters

ProLobsters is a company that ships fresh Maine lobsters directly from the ocean to customers across  the United States. All orders are shipped via FedEx Overnight or 2ndDay on the date specified by the   customer. ProLobsters acquires new customers through daily spots on the Food Channel and also has a strong repeat business. Additionally, the company sends out three promotional emails to its entire      customer base each month. The sales data for July can be found in the ProLobsters.xlsx file, which is posted on my GitHub. The email drop dates are highlighted in blue.

Data: https://github.com/robertasgabrys/DSO424SPRING2023

19. Import the data from ProLobsters.xlsx into Python and assign it to a variable called df.

20. Visualize the orders, shipments, new customers, and retention orders, and interpret the patterns seen in each graph.

21. Using the information in ProLobsters.xlsx, create a forecast for August for:

a.   daily orders,

b.   shipments,

c.   new customers,

d.  and retention orders.

You may try different models, but please include only your final models for each of the above items in your submission.

22. Create a graph of the data for each series and overlay the fitted values, August forecasts, and prediction intervals.

23. Report the historical RMSE, MAPE, and sMAPE for the training set.

24. Report the correlation coefficient between actual and predicted values.

25. What is the interpretation of the squared correlation coefficient?

26. Do you consider that your modeling approach presents an accurate picture of current and future data patterns?

27. The marketing department manager has requested that you evaluate the effectiveness of their email marketing strategy. Write a brief memo to the marketing manager and provide a

data-driven assessment of the effectiveness of their marketing strategy.