Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

RSM 338

Assignment #2: Country Risk Case Study

The files for the Country Risk case are in www-2.rotman.utoronto.ca/~hull - direct links to the files have been provided in Quercus as well.

You are required to:

(a) Extract the *.ipynb file from the *.zip file and download the Country Risk 2019 Data.csv file. Put the two files in the same directory/folder and run the notebook.

(b) Search for KMeans using Google and look at the documentation. You will see that n_init is the number of times the algorithm is run with different initial cluster centers. The default value for n_init is 10. Try several different values of n_init (e.g, 2, 20, 50 and 100) and see whether the countries in the high-risk cluster change. (1.5 pts)

(c) Set n_init back to its default value of 10. Carry out k-means clustering for k=3 with all four features (corruption index, peace index, legal risk index, and GDP growth rate). Compare the countries that are in the high-risk cluster with those that are in the high risk cluster when only three features are used. (2.5 pts)

(d) A Python package, AgglomerativeClustering, for hierarchical clustering can be imported from sklearn.cluster to carry out hierarchical clustering. (Use instruction “from sklearn.cluster import AgglomerativeClustering”). Determine three clusters from the peace index, legal risk index, and GDP growth rate. Compare the countries that are in the high-risk cluster with those that are in the high-risk cluster when the k- means algorithm is used. Try different measures of closeness (referred to as “linkage” in the package). (3.5 pts)

(e) Venezuela is not included in the 121 countries used in the country risk case study described in this chapter. Its feature values are extreme. The corruption index, peace index, legal risk index, and real GDP growth rate are 16, 2.671, 2.895, and −35%, respectively. Try adding Venezuela to the countries considered in the three-feature analysis considered in this section. How do the results change? What do your results suggest about the sensitivity of k-means to outliers? (2.5 pts)

Numbers in bracket refers to marks assigned to each question – Code will be worth 5 points and report also 5 points.

You must submit:

- Python notebook (*ipynb and *.html)

-  Short report (maximum two pages, Arial 12pts, single spacing) summarizing your findings – simply reporting results obtained on your Jupyter notebook is considered insufficient, a brief explanation of the results is required for full marks.