Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

AD654: Marketing Analytics

Assignment 2: Market Segmentation & Conjoint Analysis

For this assignment, you will need two files: food_trucks.csv and woodie.csv, each of which can be found on our course Blackboard page.

For Parts I & II of this assignment, you will upload two files into Blackboard:  The .ipynb file that you create in Jupyter Notebook, and an .html file that was generated from your .ipynb file.  If you run into any trouble with submitting the .html file to Blackboard, you can submit it as a PDF instead, or just submit both as a ZIP.  Lobster Land management prefers a PDF plus an .ipynb, so that the submission can be directly read in Blackboard .

For any question that asks you to perform some particular task, you just need to show your input  and  output  in Jupyter  Notebook or  Colab.   Tasks will  always be written in regular, non-italicized font.

For any question that asks you to include interpretation, write your answer in a Markdown cell in Jupyter Notebook.   Any homework question that needs interpretationwill bewritten in italicized font.   Do  not simply write your answer in a code cell as a comment, but use a Markdown cell instead.

Remember to be resourceful!  There are many helpful resources available to you, including the video library, the lecture notes on Blackboard, recitation sessions with the course TAs, the office hours sessions, and the web.

Part I: Segmentation (5 points)

I.     In recent years, Lobster Land has allowed a limited number of food trucks to enter the park during each day of operations.

Bring the dataset titled food_trucks.csv into your local environment in Jupyter            Notebook. This dataset contains 248 rows. Each row contains information about one food truck vendor who is potentially interested in gaining access to Lobster Land at    some point during the 2023 summer season. Since Lobster Land can only grant access to a limited number of food trucks at a time, they are hoping that your clustering skills can help them to sort through this data and get a better sense of how to proceed.

Since we’re doing k-means clustering here, we’ll only be able to use numeric variables as inputs – therefore, some things that might seem important (like the name, theme, or     food type associated with the trucks) are unknown to us from this dataset.

 

vendorID

Each vendor in the dataset has a customerID number from  1 to 654. This was done so that vendor data can be tracked.

avg_transaction_cost

The average customer expenditure  per transaction at this particular food truck.

mnths_operational

This shows how long this particular food truck has been operational, as measured in months.

ays_yr

This is the number of days per year that the food truck vendor plans to operate this truck

avg_cost_item

This  is  the  average  customer  expenditure  per  item  sold  at  this particular food truck.

dist_lobland

This is the distance, in miles, between the vendor’s home address and the front gate to Lobster Land

number  trucks

This is the total number of food trucks currently operated & managed by the vendor.

bev_percent

This is an estimate of the percentage of sales made at this food truck that are attributable to beverages (as opposed to food)

A.  Drop the vendorID variable.

a.   WhywillvendorIDnotberelevantina clusteringmodel?  In your        answer, do not just write“it will confuse the model.”Instead, take the   time to explain this with a sentence or two, using a bit of math and your understanding of Euclidean distance.

B.  Call the describe() function on your dataset.

a.   Howdoesthisfunctionhelpyoutogainan overallsense ofthecolumns andvaluesinthis (oranyother)dataset? Why isthisvaluablefor any     analystwhowilluse a datasettobuilda model?

C.  Missing values/impossible values

a.   Doesthisdatasetcontainanymissingvalues? If so, how many? Which columnshavemissingvalues?

b.  Whataboutimpossiblevalues? Do you see anyimpossiblevalueshere? If so, handle them in any way that you see fit . Why didyoutakethis     approach?

D.  Data scaling.

a.   Do yourvariablesneedtobestandardized?Why or why not?

b.  If your data requires standardization, use Python to convert your values into z-scores, and store the normalized data in a new dataframe. If not, proceed to the next step without changing the variables.

E.   Elbow chart.

a.   Build an elbow chart to help give you a sense of how you might build your model.

F.   How manyclusters willyou usefor yourk-means model?  (Remember, as noted in several places throughout the course material, there is no“right”answer to this question.  You may wish to answer this immediately after seeing your elbow plot, or after doing some more experimentation).

G.  Build a k-means model with your desired number of clusters.

H.  Generate and show summary statistics about each of your clusters.

I.    Build any four simple visualizations to help management better understand your clusters (a simple visualization could be a histogram, a barplot, a scatterplot, etc.

– it should show original variables from the dataset)  You may wish to facet your visualizations by cluster.

For each one of your visualizations, include 2-3 sentences of description/ explanation. Whatdoesitshowaboutyourmodel?

J.   Give adescriptive name toeachone ofyourclusters,alongwithafewsentences of explanation for the name thatyou chose.    As you describeeachsegment, writea bitaboutthetypesofvendors likelytobelongtoeachgroup.

K.  Finally, how can Lobster Land use this model?  In a paragraph of at least 4-5 sentences, identify some ways in which Lobster Land could benefit from having this model.  How can thisbe used?   For Step K, apply some business sense and some creativity.

Part II:  Conjoint Analysis with a Linear Model (4 points)

 

With the 2023 season just a few months away now, Lobster Land is considering the    addition of a new ride in the park. Specifically, Lobster Land is thinking about whether to add a wooden roller coaster, whose track could encircle the other rides currently   operating within the park. A wooden roller coaster, sometimes called a“woodie”, is an older type of ride that has regained popularity among roller coaster enthusiasts in       recent years.

To gather more information before moving ahead, the park conducted some survey  research. They asked a general sample of the population near Portland, Maine about their preferences for wooden roller coasters. Each survey respondent saw a random sample of 5 possible options, or bundles, and was asked to rate those bundles from   1- 10. By giving this survey to many thousands of people, Lobster Land was able to    generate this dataset.

The woodie.csv dataset contains 288 rows -- one each for each of the unique      feature combinations that the park tested. It also contains average ratings for each combination.

Park management needs your help!  Of course, the park could just rank the       combinations to quickly see which combination was most popular overall among respondents, but they are hoping that you can do some conjoint analysis to help them gain deeper, more meaningful insights about people’s preferences regarding particular features and options.

This dataset contains the following variables:

bundleID

This is a series of sequential integers from 1 to 288.

start_high

The options here are either“Yes”or“No.”A“Yes”option refers to a roller coaster whose riders begin the ride at a high altitude, so that the first drop  can occur without a preceding slow climb upward. A“No”option refers to a more traditional roller coaster, which starts at a low level, and undergoes a slow climb, before making its big drop.

maxspeed

Users had three options for maxspeed, which is the maximum speed in miles per hour (mph) reached by the roller coaster during the ride. The options   were 40mph, 60mph, and 80mph.

steepest_angle

The two options here are either 50 or 75. This refers to the number of     degrees associated with the steepest drop on the ride. To get a sense of    how steep a 75-degree drop is, you may want to do a Google image search.

seats  car

The roller coaster designers have indicated that each“car”can be constructed with either two seats or four seats.

rop

This is the size of the largest vertical drop during the ride. Options were 100 feet, 200 feet, or 300 feet.

track_color

The four options that survey respondents saw here were green, blue, white, and red.

avg_rating

This is the average rating that the bundle received, on a score from 0 to 10.

A.  Read the dataset woodie.csv into your local environment in Jupyter Notebook.

B.  Based on the descriptions shown above, which ofyour variables are numeric, and whichare categorical?  (The standard you should use when answering this is that something that is both represented by a number, and for which that number has valid mathematical meaning, is numeric).

C.  After  first  removing  the  bundleID  variable,  use  the  pandas get_dummies() function in order to prepare the remaining variables for use in a linear model.  Inside this function, include this argument: drop_first = True.  Doing this will save us from the multicollinearity problem that would make our model unreliable.   Be sure to dummify ALL of your input variables, even the numeric ones.

a.   Why should the numeric input variables based on this survey data be dummified?

D.  Build a linear model with your data, using the average rating as the outcome variable, and with all of your other variables as inputs.

E.  Display the coefficient values of your model inputs.

F.   Now, deliver some paragraphs of analysisfor Lobster Land managementabout whatyourmodelisshowingyou.

It would begood here to include some detailabout whichfeatures seemedtobe most popular, or least popular, among respondents.   However, a truly thoughtful answer to this question will go beyond simply listing the coefficients in order of popularity.  What other insightscan youdraw fromthis?  Isthere anythingelseyou would want to consider before simply recommendingthat Lobster Landimplement the most popular’options?   Remember, Lobster Landhiredyou as a consultant -- don’tbeafraidtoshow some creativityhere.

“I can’t answer this because I’m not sure what this variable means” = NOT the way to go here. If you aren’t sure about a particular variable, ask your Professor or one of your TAs.

You can use either statsmodels or scikit-learn to build the model.  If you use statsmodels, you may see high p-values for individual levelsof categorical variables – but keep all the variables you used at Step D.

Part III: Wildcard: Marketing & Segments (1 point) 

A.  Find ANY advertisement…ANYWHERE. As you walk around in your daily life, you    might look for an ad on the side of the T, on a bus stop, on a poster, etc. Alternatively, you could use an advertisement that you encounter while browsing the web.

a.   Take a picture of the ad that you see (if it’s in the‘real world’). Or, if the ad you select is online, take a screenshot from your phone or your laptop to capture   this advertisement.

b.  Write  ONE  thoughtful  paragraph  that  addresses the  issue  of segmentation. What  consumer  segment  is your  ad targeting?   What makes you think this? What types of consumers are in the segment?  Are you part of the segment?  Or, alternatively, is your ad an undifferentiated (mass market) ad?   Finally, what is your opinion of this ad – is it effective?

You can embed your image, along with your paragraph write-up, in a Markdown cell in Jupyter Notebook.  Alternatively, you could upload your image and paragraph in a separate file, such as a Word doc. The ad can be in any language – but if it’s not in English, please translate.