Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Population Attitudes – Clustering Lab

Numerous studies have attempted to analyze attitudes of populations, and countries which have the most content and happiest population.  Typically, the happier the population, the more productive the workforce is, the higher the investment into the community is, and the stronger the loyalty is to the country.  As such, having a happier population is a very important calculation for countries to make.
As might be expected, it is difficult to accurately calculate a countries level of happiness. A number of efforts have been made to use various drivers to draw conclusions on which countries are happiest.

The data set Clustering Lab.jmp contains this database. In general, the higher the values, the better they were perceived. The dataset contains the variables below:

Country – The name of the country

Score – The sum of Investment Potential, Diet, Freedom of Press, Recreation, Education System, Healthcare, Infrastructure Development, Currency Stability, Transportation, Political Stability

Investment Potential – Perceived attractiveness of investing into the country (0.00 – 1.00)

Diet – Perceived quality of the diet of the population (0.00 – 1.00)

Freedom of Press – Perceived freedom of press (0.00 – 1.00)

Recreation – Perceived ability to engage in recreational activities (0.00 – 1.00)

Education System – Perceived quality of education  (0.00 – 1.00)

Healthcare – Perceived accessibility to healthcare facilities  (0.00 – 1.00)

Infrastructure Development – Perceived level of infrastructure development (0.00 – 1.00)

Currency Stability – Perceived confidence in currency (0.00 – 1.00)

Transportation – Perceived accessibility to transportation (0.00 – 1.00)

Political Stability – Perceived political stability (0.00 – 1.00)

Using k-Means clustering, answer the following questions:

1. If the data set asked for respondents to explain their favorite feature of a country (e.g. mountains, beach, activities etc. etc.) would this be helpful for us to cluster the data?

2. Using a cluster model with 6 clusters, which cluster has the highest score on average? How many data points are in that cluster?

3. What are the Cluster Means of this cluster?

4. Which cluster has the highest number of data points? How many data points are in that cluster?

5. Let’s assume that we want to use the statistical best number of clusters. If we used a range of clusters (3-9), which number of clusters would be best to use?  Why?

6. 7 Cluster, 8 Cluster and 9 Cluster models all have clusters with 1 data point in them. What country is this? What does it mean when a cluster has one data point in it?

7. Do the number of clusters matter?  For example, explain in simple terms what it would mean if we used 3 clusters as opposed to 23 clusters.

For questions 8, 9, 10, 11 use hierarchical clustering (with all values except ‘Country’)

8. Compare the distance in going from a 3 cluster model to a 4 cluster model, with the distance in going from a 4 cluster model to a 5 cluster model. Based on your understanding of distance and length of the lines in the dendrogram, which model (3 clusters or 4 clusters) would be best to use? Why?

9. Create a model with 5 clusters. How many data points are in each cluster? What are the mean scores of each cluster?

10. Are there any areas in which the cluster with the lowest score performs better in, compared to the cluster with the highest score?

11. A data analyst suggests that if a country wants happier residents, they should invest heavily in healthcare, as this leads to a higher level of investment potential, and therefore happier residents. Do you agree?

12. An analyst suggests that the lower the number of clusters, would give us the strongest and most meaningful result, as opposed to a large number of clusters. Comment on whether this suggestion is valid, by using a 1 cluster, 15 cluster, and 152 cluster model as a basis for comparison.

13. Compare the clustering techniques discussed on this lab with the classification techniques discussed in the previous lab.

14. Compare the cluster means of the k-means clustering model with 3 clusters to the cluster means of the hierarchical clustering model with 3 clusters. Which model do you believe is more descriptive of the data?

15. Compare the cluster means of the k-means clustering model with 4 clusters to the cluster means of the hierarchical clustering model with 4 clusters. Which model do you believe is more descriptive of the data?

16. Which clustering model and number of clusters do you believe is the most descriptive of the data? Why?