Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Workshop 9:

Cluster Analysis

Question 1

Adelaide Financial Advisers (AFA) maintains a customer database whose sample is shown in the DemoKTC data set.  AFA would like to segment customers into several groups so that customers within a group are similar with respect to key characteristics.  The customer data and their definition included in the database are appended below:

❼ Age = age of the customer in whole years

❼ Female = 1 if female, 0 if not

❼ Income = annual income in dollars

❼ Married = 1 if married, 0 if not

❼ Children = number of children

❼ Loan = 1 if customer has a car loan, 0 if not

❼ Mortgage = 1 if customer has a mortgage, 0 if not

Using matching distance to compute dissimilarity between observations, apply hierarchical clustering employing‘complete’linkage to the data in DemoKTC to create three clusters based on the Female, Married, Loan, and Mortgage variables. Report the characteristics of each cluster including the total number of customers in each cluster as well as the number of customers who are female, the number of customers who are married, the number of customers with a car loan, and the number of customers with a mortgage in each cluster.  How would you describe each cluster?

Question 2

Paul Mater works for the supply-chain management division of Trader Joe’s, a national chain of specialty grocery stores.  Trader Joe’s is considering a redesign of its supply chain.  Paul knows that Trader Joe’s uses frequent truck shipments from its distribution centers to its retail stores. To keep costs low, retail stores are typically located near a distribution center. The file TraderJoes contains data on the location (longitude and latitude) of Trader Joe’s retail stores. Paul would like to use K-means clustering with k=8 to estimate the preferred locations for a proposal to use eight distribution centers to support its retail stores. If Trader Joe’s establishes eight distribution centers, how many retail stores does the k-means approach suggest assigning to each distribution center?

Question 3

Boileau Adelaide (BA) employs a network of expert analytics consultants for various projects. To help it determine how to distribute its bonuses, BA wants to form groups of employees with similar performance according to key performance metrics.  Each observation (corresponding to an employee) in the file BigBlue consists of values for UsageRate which corresponds to the proportion of time that the employee has been actively working on high-priority projects, Recognition which is the number of projects for which the employee was specifically requested, and Leader which is the number of projects on which the employee has served as project leader. Normalize the values of the input variables to adjust for the different magnitudes of the variables. How many clusters do you recommend to categorize the employees? Why? Examine how high achieving the employees in each of the clusters are.