关键词 > STATISTICS452/652

STATISTICS 452/652: Introduction to Statistical Learning

发布时间:2022-11-27

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATISTICS 452/652: Introduction to Statistical Learning

October 5, 2022

FINAL PROJECT

Due Dates: Nov 30. Submit your report in pdf to Crowdmark.

POLICY

1. This project is to be completed independently. You may  use whatever class materials  you wish in completing this assignment. BUT DO NOT DISCUSS RESULTS WITH ANYONE ELSE, WITHIN OR OUTSIDE OF THE CLASS. Failure to follow this directive will result in a failing grade.

2. Late projects will be accepted at a penalty of 2 points/hour (it’s a 100 point project).

3. You are allowed to clarify the project requirements, but you are not advised to show or discuss your answer with the TA and instructor and seek feedback from them.

4. The deadline for submission is Nov 30. Details of how to submit the answer will be posted later.

ASSIGNMENT

Statistical learning is an expanding field of study. One of the key trasferable skills is to be able to dive deeply on your own, read the web documents and online tutorials to learn about a new maching learning techniques on your own. You will be given a data set and will write a report of both analytical and instructional nature. You will choose the options below for self-studying and learning to apply the procedures in R.

1. K-means clustering.

2. Hierarchical clustering.

3. K-Nearest Neighbour

These procedures are to be applied on a given dataset. You can use any datasets from the UCI machine learning repositories as your example dataset. One of the example is the Iris data set. But you are encouraged to use other data set as you see fit.

1. Iris data (https://archive.ics.uci.edu/ml/datasets/Iris)

Your job is to write a tutorial article that teaches the key elements of K-means clustering, hierarchical clustering, and K-Nearest Neighbour. You will write about:

1. How K-means clustering, hierarchical clustering and K-Nearest Neighbour works

2. Comparison between hierarchical clustering and K-means clustering, and K-Nearest Neighbour

3. How to use R to do K-means, hierarchical clustering and K-Nearest Neighbour You should be able to explain these clearly in your report.

DELIVERABLES

You  will prepare a pdf report consisting of two  parts.  The first part is the written report  of no more than 7 pages. The second part is an appendix that contains the table and figures output from the R, indexed as Table 1, Table 2, Figure 1, Figure 2, etc, or anything else you believe is important. The appendix could be of any length.

Pay attention to avoid plagiarism. Avoid direct copying of the relevant online resources. Rephrase and reorganize what you have read in your own words. Obvious failure to avoid plagiarism may result in a failing grade.

GRADES

Your grade will be assigned competitively based on the quality and coherence of your report. My rubric includes marks for

• clarity of report,

• quality and thoroughness of the tutorial,

• the “degree of difficulty” associated with the example.

TIPS FOR HOW TO GET STARTED

There are plenty of material online for you to learn. For a beginner, I suggest first watching a few youtube/courseera tutorials on these techniques. You will find there are plenty. Then, have a read of the methods in the textbook. From there, you can start googling these options and read more on it. There will be R code examples shown in these online youtube videos, and you can adapt these code examples to apply on your example data set.

TIPS WHEN YOU GET STUCK

I encourage you to choose to create tutorial reports on the more advanced aspects of these unsupervised learning techniques. When you get stuck and are unable to understand the relevant online resources. Take a step back and try to find other resources on the very same problem that are easier to understand. Gradually deepen your understanding and also give yourself a lot of time. Allow yourself to come back and re-read certain pages that are hard to understand in your first read. You may find it surprising that the previously hard material may suddenly become easier to you if you give it some time to sink in.

FINAL COMMENTS

I hope this is a useful experience for you. I hope that many of you can learn from this journey and prove to yourself that you are capable to handle challenges on your own. Remember, in real life you will face a situation where your job requires you to acquire a technically challenging skill on your own. This is practice...