DATA 311



Winter 2021 Term 1

Course Description

Official Calendar: Regression, classification, resampling, model selection and validation, fun-damental properties of matrices, dimension reduction, tree-based methods, unsupervised learning. Credit will be granted for only one of STAT 311 or DATA 311. Credits: 3

Pre-reqs: Either (a) STAT 230 or (b) a score more than 75% in one of APSC 254, BIOL 202, PSYO 373; and one of COSC 111, APSC 177.

Course Objectives: The course is designed to introduce students to classical machine learning methods for regression and classification with an emphasis on model validation (i.e. it is not enough to fit a model, students should be able to estimate how good the resulting model is). By taking this course, students will gain experience in applying machine learning algorithms in R and develop skills for effectively communicating a proper interpretation of the results.

Learning Outcomes:

At the end of this course, students should be able to:

1. build a model and validate it

2. understand fundamental proofs for techniques that rely on matrix algebra

3. compute linear regression and apply hypothesis testing

4. perform logistic regression and discriminant analysis

5. apply the K-fold cross-validation methods

6. apply the LASSO and ridge regression methods

7. apply bagging and boosting on tree-based methods

8. apply some methods of unsupervised learning (e.g. principal components, or k-means clus-tering).

9. manipulate data sets in R including applying the above methods

Course Format: Synchronous, i.e. real-time delivery of lectures, will be held via Zoom (links provided on Canvas). These zoom sessions will be recorded and uploaded to Canvas for reviewing. Slide decks will be posted to Canvas prior to our scheduled lecture time. Slides might be supple-mented with handwritten material which I will upload to Canvas after lecture. Lectures may also include discussions which you will only gain access to by attending and/or reviewing zoom lectures. While statistical software (i.e. R code and output) will be discussed during lecture, practical skills and applications of topics are covered primarily in computer labs.

Office Hour Format: I will hold virtual office hours via zoom each Thursday 3:00 – 4:00 PM. If you are unable to make those times, please reach out to me to schedule an alternative appointment. Zoom links provided on Canvas.

Marking and Evaluation

Final grades will be based on the evaluations listed above. Note that the alternative weighting schemes will only be used if it improves your grade from the default calculation. The final grades will be assigned according to the standardized grading system outlined in the UBC Okanagan Calendar.

Grading Practices: Faculties, departments, and schools reserve the right to scale grades in order to maintain equity among sections and conformity to University, faculty, department, or school norms. Students should therefore note that an unofficial grade given by an instructor might be changed by the faculty, department, or school. Grades are not official until they appear on a student’s academic record. http://www.calendar.ubc.ca/okanagan/index.cfm?tree=3,41,90,1014

Midterms: There will be two (2) synchronous midterms given during the term. Midterms will be formatted as a timed Canvas quizes scheduled during our designated lecture time on Wednesday, October 13 and Monday, November 15.

Final Exam: The examination period begins Saturday, December 11 and ends Wednesday, De-cember 22. The final exam is comprehensive, covering all the material presented throughout the course. The final exam will be held online in a format similar to your midterms. The date and time is to be determined (TBD).

Assignments: There will be approximately four (4) assignments. Assignments will incorporate material covered during lab as well as lecture. Answers will be submitted electronically through Canvas.

Missing/Late Grade Items:

Assignments  Assignments are to be submitted electronically through Canvas. Late assignments will have 10% deducted for each day (which includes weekends) beyond the due date. As-signments that are more than 2 days (i.e. 48 hours) overdue will not be accepted.

Midterms  Missed midterms will have their weight shifted to the final according to the according to the Alt 2/3 grading scheme. NO make-up tests will be provided.

Final Examinations  Except in the case of examination clashes and hardships (three or more formal examinations scheduled within a 24-hour period) or unforeseen events, students will be permitted to apply for out-of-time final examinations only if they are representing the University, the province, or the country in a competition or performance; serving in the Canadian military; observing a religious rite; working to support themselves or their family; or caring for a family member. Unforeseen events include (but may not be limited to) the following: ill health or other personal challenges that arise during a term and changes in the requirements of an ongoing job. Further information on Academic Concession can be found under Policies and Regulation in the Okanagan Academic Calendar http://www.calendar.ubc.ca/okanagan/index.cfm?tree=3,48,0,0

Course Material and Tools

Course Website: The course website can be accessed through UBCO’s Learning Management System (LMS) Canvas: https://canvas.ubc.ca/. It is recommended that you log in daily to check for announcements, participate in discussions, access course materials, submit and complete assign-ments, and review upcoming deadlines. You can review and change default notification preferences for this course if you wish.

Textbooks: Our primary source of reference will be:

Title: An Introduction to Statistical Learning with Applications in R, Second Edition

Authors: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani

Download page: https://www.statlearning.com/ (free)

Some additional content will have coverage in:

Title: The Elements of Statistical Learning: data mining, inference, and prediction

Authors: Hastie, Tibshirani, Friedman.

Download page: https://web.stanford.edu/∼hastie/ElemStatLearn/ (free)

Software: Our course will exclusively be using R. I strongly recommend that you use RStudio for running R.


Lab format: All students must be registered for a lab (held weekly unless otherwise specified). Please check your registration to determine your lab section and time. Labs are structured as walk-though tutorials which help to develop the practical skills of performing machine learning in R. You may work through the lab material on your own time and/or work through them during your scheduled lab. To ensure that TAs are not overloaded during a single lab, please do not attend labs for which you are not registered.

Labs sessions will be hosted by your TA online via zoom (links provided in Canvas). While they are primarily there to provide guidance on carrying out analyses in R, they additionally provide the opportunity to meet other students from class, ask questions and/or discuss concepts from lecture, and receive assistance on assignments. Thus, labs will act as addition “office hours” held by your TAs. While labs are not mandatory (i.e. attendance will not be taken) you are highly encouraged to attend. Do not skip going through this material as lab content will be fair game for testing on midterms and the final exam.

Tentative Course Schedule

Below is a tentative outline for the course. These topics are subject to change depending upon how quickly we can cover the material.

Please note the following holidays:

- National Day for Truth and Reconciliation, September 30

- Thanksgiving Day October, 11

- Midterm break November 8–12 (inclusive)

If you celebrate any other holidays that are not listed above, please feel free to contact me directly if you feel that they will potentially conflict with the outlined course structure. For other important UBCO related dates visit: http://www.calendar.ubc.ca/okanagan/academicyear.cfm


Your responsibilities to this class and to your education as a whole, include attendance and par-ticipation. You have a responsibility to help create a classroom environment where all may learn. At the most basic level, this means you will respect the other members of the class and the in-structor and treat them with the courtesy you hope to receive in return. Inappropriate classroom behaviour may include: disruption of the classroom atmosphere, profanity in classroom discussion, use of abusive or disrespectful language toward the instructor, a student in the class, or about other individuals or groups. While I do not require that you turn on your video during lecture, I trust that those in attendance will remain present and refrain from engaging in non-class activities. I also ask that students microphones will remain muted during lectures (unless addressing the class) to prevent any background noise distractions.

