STATS5099: Data Mining and Machine Learning
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS5099: Data Mining and Machine Learning
This course will introduce students to machine learning methods and modern data mining techniques, with an emphasis on practical issues and applications.
Course Arrangements and Materials
Lectures
The first lecture will be live at 3pm on Wednesday 12th January 2022. All remaining lectures will be pre-recorded and available from the Moodle site for Data Mining and Machine Learning (DMML).
Ten lecture notes, embedded with lecture recordings, will be distributed every Monday at 9am. Rather than watching recordings in one go, you are encouraged to read lecture notes first and watch the recordings when suggested.
There are tasks for you to complete while reading the lecture notes. Some tasks will be discussed in tutorials or labs (explained below).
In addition, there will be an exercise sheet including conceptual questions and applied questions. This will also be available every Monday with solutions posted every Friday.
Tutorials
1-hour tutorials will take place on Zoom every week, which will contain a short summary of lecture ma- terials, Q&As and a discussion of conceptual questions. You are strongly encouraged to study the lecture materials and attempt the questions before attending the tutorials. Date and time for tutorials will be posted
on Moodle.
Labs
1-hour labs will be take place at 2-3pm every Friday, which demonstrates the use of R to analyse some practical datasets and discusses applied questions. You can choose to attend in-person or remotely on Zoom. In-person session will be held at Boyd Orr Building, Lab 420. Links to Zoom sessions will be posted on Moodle.
Course Schedule
The course will consist of ten topics, as follows.
• Week 1 – Dimension reduction and principal component analysis (PCA)
• Week 2 – PCA biplot and principal component regression
• Week 3 – k-nearest neighbours and linear discriminant analysis
• Week 4 – Tree-based methods
• Week 5 – Support vector machines
• Week 6 – Neural networks (Part I)
• Week 7 – Neural networks (Part II)
• Week 8 – Hierarchical cluster analysis
• Week 9 – Partitioning cluster analysis
• Week 10 – Recommendation systems
Software
The R language will be used throughout the course. You can download it for free from the R website http://cran.r-project.org. RStudio is another free piece of software that can be used to make R more easy to use: https://www.rstudio.com/products/rstudio/.
Assessment
100% written exam.
Revision material
It will be beneficial for you if you revise the following subjects:
• linear regression (from Regression Models);
• eigenvectors and eigenvalues (from Preliminary Mathematics for Statisticians)
Resources
The course will be self-contained in the learning material. However, for each week we will point to chapters in the following books that you might wish to consult for additional material:
• Trevor Hastie, Robert Tibshirani and Jerome Friedman – The Elements of Statistical Learning
• Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani – An Introduction to Statistical Learning
• Alex Smola and S.V.N. Vishwanathan – Introduction to Machine Learning
• David Barber – Bayesian Reasoning and Machine Learning
• Christopher M. Bishop – Pattern Recognition and Machine Learning
• Simon Rogers and Mark Girolami – A First Course in Machine Learning
2022-07-28