Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS5099: Data Mining and Machine Learning

This course will introduce students to machine learning methods and modern data mining techniques, with an emphasis on practical issues and applications.

Course Arrangements and Materials

Lectures

The rst lecture will be live at 3pm on Wednesday 12th January 2022.  All remaining lectures will be pre-recorded and available from the Moodle site for Data Mining and Machine Learning (DMML).

Ten lecture notes, embedded with lecture recordings, will be distributed every Monday at 9am. Rather than watching recordings in one go, you are encouraged to read lecture notes first and watch the recordings when suggested.

There are tasks for you to complete while reading the lecture notes. Some tasks will be discussed in tutorials or labs (explained below).

In addition, there will be an exercise sheet including conceptual questions and applied questions. This will also be available every Monday with solutions posted every Friday.

Tutorials

1-hour tutorials will take place on Zoom every week, which will contain a short summary of lecture ma- terials, Q&As and a discussion of conceptual questions. You are strongly encouraged to study the lecture materials and attempt the questions before attending the tutorials. Date and time for tutorials will be posted

on Moodle.

Labs

1-hour labs will be take place at 2-3pm every Friday, which demonstrates the use of R to analyse some practical datasets and discusses applied questions.  You can choose to attend in-person or remotely on Zoom.  In-person session will be held at Boyd Orr Building, Lab 420.  Links to Zoom sessions will be posted on Moodle.

Course Schedule

The course will consist of ten topics, as follows.

Week 1 – Dimension reduction and principal component analysis (PCA)

Week 2 – PCA biplot and principal component regression

Week 3 – k-nearest neighbours and linear discriminant analysis

Week 4 – Tree-based methods

Week 5 – Support vector machines

• Week 6 – Neural networks (Part I)

Week 7 – Neural networks (Part II)

 Week 8 – Hierarchical cluster analysis

• Week 9 – Partitioning cluster analysis

• Week 10 – Recommendation systems

Software

The R language will be used throughout the course.  You can download it for free from the R website http://cran.r-project.org. RStudio is another free piece of software that can be used to make R more easy to use: https://www.rstudio.com/products/rstudio/.

Assessment

100% written exam.

Revision material

It will be benecial for you if you revise the following subjects:

• linear regression (from Regression Models);

• eigenvectors and eigenvalues (from Preliminary Mathematics for Statisticians)

Resources

The course will be self-contained in the learning material. However, for each week we will point to chapters in the following books that you might wish to consult for additional material:

Trevor Hastie, Robert Tibshirani and Jerome Friedman – The Elements of Statistical Learning

• Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani An Introduction to Statistical Learning

• Alex Smola and S.V.N. Vishwanathan – Introduction to Machine Learning

David Barber Bayesian Reasoning and Machine Learning

Christopher M. Bishop – Pattern Recognition and Machine Learning

Simon Rogers and Mark Girolami  A First Course in Machine Learning