CS-GY 6923 Machine Learning
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Syllabus
Computer Science & Engineering
Course Outline CS-GY 6923 Machine Learning
Spring 2022 1/24/2022 - 5/9/2022
Course Pre-requisites
Course Description
This course is an introduction to the field of machine learning, covering fundamental
techniques for classification, regression, dimensionality reduction, clustering, and model
selection.
There are three parts to this course,
in Part 1 (EDA): students will choose any dataset with 100K observations 15 or more
dimensions suitable for classification and perform comprehensive exploratory data
analysis. Students will write an elaborate report on the dataset and its suitability for
further analysis using classification techniques.
In Part 2 (Classification) students will use any three (or more) single classifiers and
write a critical review and analysis of the performance and plausible causes for the
differences observed.
And in Part 3 (Ensemble), students will use three of the many ensemble methods to
improve the performance and write a report comparing the performance of the individual
classifiers in part 2 and the meta classifiers in part 3.
Throughout the 14 week semester students will submit a weekly report enumerating
progress made, problems encountered and plan for the following week. Instructor
will introduce many supervised techniques, optimization methods, and unsupervised
methods and several ensemble techniques.
Instructor will share working R implementations and students are challenged to recast
them in reusable and parallel versions for the three deliverables in part 1,2&3.
The EDA report is due the last week of February.
The comparative classifier report is due the last week of March. The
ensemble method report is due the last week of April.
There will be two tests one in 3rd week of March and the 3rd week of April.
Course Objectives
Students are expected to attain
1. conceptual understanding of both Supervised/Unsupervised Learning
Techniques. Understand the statistical/algebraic foundation of these techniques,
relative strengths and weakness, theoretical and practical criteria in adopting a
model.
2. Understanding the process discipline: collect, describe, model, explore and verify
data.
3. Engineering. Use industry standard environment and process to conduct
repeatable and reproducible classification experiments.
4. Experimentation and Analysis: Run prescribed process to optimize model using
multiple classification algorithms, evaluate them using standard performance
metrics.
5. Deliver summary results of the experiments and explain key decisions they made
in designing the model and model output.
Course Structure
This is an online course. All lectures, meetings are done using zoom accessed
through brightspace.nyu.edu
We meet on Mondays at 8:00 PM.
Office hours on Mondays 4 to 6 PM by appointment via email by Sunday 5 PM.
For participation, students have to make something original and comment on two
comments made by other students before the next class.
Readings
URL: https://statlearning.com /
AUTHORS:Trevor Hastie, Robert Tibshirani, Jerome Friedman
TITLE:An Introduction to Statistical Learning
An optional and recommended text
URL:http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pd f
AUTHORS:Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
TITLE:Elements of Statistical Learning, CODE:ESL
Online resources: https://www.quora.com/Wha t - i s - th e - bes t - boo k - t o - lear n - M L
Stackoverflow and other sources
Green01: Required Reading
pages.stern.nyu.edu/~wgreene/Text/Greene-EA-7&8ed-Appendices.pdf
math.nyu.edu/~cfgranda/pages/DSGA1002_fall16/material/linear_algebra.pd f
https://www.cns.nyu.edu/~eero/mat h - tools/Handouts/linalg_jordan_86.pd f
Fisher01: Required Reading
Fisher's Discriminant Analysis: Linear Discriminant Analysis, Multivariate QDA [ISLR: 4.6.3,4.6.4]
NN:Neural Nets (any introductory material will suffice)
Chapter 11 intro to ML 3rd Edition, Ethem Alpaydin
Chapter 04 and Chapter 05 Miroslav-Kubat-Springer
Grade Distribution
Quality of Performance Letter Grade Range %
A+ 97-100
Excellent - work is of exceptional quality A 93 – 96.9
A- 90 - 92.9
Good - work is above average B+ 87 - 89.9
Satisfactory B 83 - 86.9
Below Average B- 80 - 82.9
Poor C+ 77 - 79.9
C 70 - 76.9
Failure F < 70
Not everyone can get A or A-. There will be a distribution of grades. Grade
Calculation
Grades in this course are determined by the percentage of points obtained.
Course assignments Percentage of
Final Grade
Points
Homework – 1 (HW01 - EDA) 15.00% 15
EDA assignment (15%) due 03/05/2022
Analyze, the structure of your dataset (15%) due
09/03/21 identifying redundant,correlated and
constant(useless) features, variableImportance and VIF
Homework – 2 (HW02 – Individual Classifier) 20.00% 20
Experiments with 3 or more Supervised Learning
Techniques. Estimate variance and bias.
Tabulate critical/appropriate metrics.
Write a critical review of the observed performance
differences, due 04/09/2022
Engagement 10.00% 10
You must participate in weekly forums and
discussions.
Discussions are applied analysis from the texts.
You must post a response by Sunday midnight (ET)
You must submit your weekly engagement
You should provide meaningful feedback on the
analysis.
HW03 – experiments with Ensemble techniques 25.00% 25
Improve your work in HW02 using
CV/Bagging/Boosting/RandomForest/Stacking – 15%
Due 05/07/2022 – analyze and summarize, compare
and contrast these techniques and their utility to
improve performance – 15%
Open book, open notes review test – t1 03/25/22 10.00% 10
Open book, open notes review test – t2 04/25/22 10.00% 10
Quiz, once in every two weeks 10.00% 10
Total 100% 100
Course requirements
Submit all assignments before 11:59 PM on the due date, specified above.
Course Outline:
Please note that this schedule is subject to change depending on progress, questions,
requests, etc.
Week Topics Tentative Date/Reading
1 Machine Learning, Supervised Learning (Classification)
Unsupervised Learning(Clustering). Reinforcement
Learning is not in scope.
Supervised Learning:Generic Concepts applicable to all
supervised learners: Occam's Razor,No Free Lunch
Theorem, Induction/Generalization, Loss Function
Minimization (aka Optimization), Bias/ Variance,Inability to
learn, inability to generalize
ISLR:Chap01, Chap 2.1,2.2.2,
2.2.3, Chap 4.1
2 Refresher Advanced Probability[Expected Value and Linear
Algebra, Matrices, Vectors]/Statistics[i.i.d, CLT,
LLN,descriptive/summary statistics,moments]/
Dataset manipulation in R, Datasets, Numerical/Categorical
data, Scale and Models
ISLR:Chap02
Greene01
3 Supervised Learning: Regression, Logistic Regression,
Antidote to overfitting: regularization techniques: Ridge (L2),
Lasso (L1) Classifier performance:TP,FP,TN,FN
RoC, AUC, Accuracy,Specificity, Sensitivity,Precision,
Recall,
ISLR:Chap 4.1, 4.3, 4.6.2
4 Sigmoid Function, activation function, perceptron (aka
neural nets or ANNs) Extending the perceptrons with back
propagation, hidden layers, other activation function
Explainability, Regularization:Ridge, Lasso and ElasticNet
ISLR:
5 Uncorrelated Features:Naïve Bayes
Generative vs Discriminative Classifiers
6 Instance Based techniques (no assumptions about the
distribution, aka non-parametric) Distance,
Nearest Neighbor kNN
Chap 4.6.5
7 Curse of Dimensionality, Mahalanobis Distance,
Dimensionality Reduction (LDA as Feature Selection,
LASSO as Feature Selection, PCA (Cholesky,Eigen, SVD)
ISLR:4.4.2 ISLR:
6.3
8 Decision Trees (no assumptions about the distribution, aka
non-parametric), Entropy, Information Gain
ISLR:8
9 Support Vectors (no assumptions about the distribution, aka
non-parametric)
Support Vector Machines: Margins, Kernel, Radial Basis,
Gaussian
ISLR:9
10 What causes inferior performance, techniques for
performance improvement Resampling:Varying dataset
Bootstrapping, Cross Validation, Bagging, Boosting
ISLR:Chap 05
ISLR:8.2
11 Combining Classifiers: Aggregating variants of one classifier,
combining heterogeneous Classifiers, Stacking
article (instructor will provide)
12 Relevance of Stacking/CV/Bagging to parallelism and NFL article (instructor will provide)
13 Unsupervised Learning;Clustering ISLR:Chap. 10
Survey article (instructor will
Topic Modeling in Text Analytics,SOMs provide)
14 Semi Supervised – leveraging strengths of unsupervised and
supervised
article (instructor will provide)
15
Final Exam No class
Moses Center Statement of Disability
If you are student with a disability who is requesting accommodations, please
contact New York University’s Moses Center for Students with Disabilities (CSD)
at 212-998-4980 or [email protected]. You must be registered with CSD to
receive accommodations. Information about the Moses Center can be found at
www.nyu.edu/csd. The Moses Center is located at 726 Broadway on the 3rd
floor.
NYU School of Engineering Policies and Procedures on Academic
Misconduct – complete Student Code of Conduct here
A. Introduction: The School of Engineering encourages academic
excellence in an environment that promotes honesty, integrity, and
fairness, and students at the School of Engineering are expected to
exhibit those qualities in their academic work. It is through the process
of submitting their own work and receiving honest feedback on that
work that students may progress academically. Any act of academic
dishonesty is seen as an attack upon the School and will not be
tolerated. Furthermore, those who breach the School’s rules on
academic integrity will be sanctioned under this Policy. Students are
responsible for familiarizing themselves with the School’s Policy on
Academic Misconduct.
B. Definition: Academic dishonesty may include misrepresentation,
deception, dishonesty, or any act of falsification committed by a
student to influence a grade or other academic evaluation. Academic
dishonesty also includes intentionally damaging the academic work of
others or assisting other students in acts of dishonesty. Common
examples of academically dishonest behavior include, but are not
limited to, the following:
1. Cheating: intentionally using or attempting to use unauthorized
notes, books, electronic media, or electronic communications in
an exam; talking with fellow students or looking at another
person’s work during an exam; submitting work prepared in
advance for an in-class examination; having someone take an
exam for you or taking an exam for someone else; violating
other rules governing the administration of examinations.
2. Fabrication: including but not limited to, falsifying experimental
data and/or citations.
3. Plagiarism: intentionally or knowingly representing the words or
ideas of another as one’s own in any academic exercise; failure
to attribute direct quotations, paraphrases, or borrowed facts or
information.
4. Unauthorized collaboration: working together on work meant to
be done individually.
5. Duplicating work: presenting for grading the same work for more
than one project or in more than one class, unless express and
prior permission has been received from the course instructor(s)
or research adviser involved.
6. Forgery: altering any academic document, including, but not
limited to, academic records, admissions materials, or medical
excuses.
NYU School of Engineering Policies and Procedures on Excused Absences
– complete policy here
A. Introduction: An absence can be excused if you have missed no more
than 10 days of school. If an illness or special circumstance has
caused you to miss more than two weeks of school, please refer to the
section labeled Medical Leave of Absence.
B. Students may request special accommodations for an absence to be
excused in the following cases:
1. Medical reasons
2. Death in immediate family
3. Personal qualified emergencies (documentation must be
provided)
4. Religious Expression or Practice
Deanna Rayment, [email protected], is the Coordinator of Student
Advocacy, Compliance and Student Affairs and handles excused absences. She
is located in 5 MTC, LC240C and can assist you should it become necessary.
NYU School of Engineering Academic Calendar – complete list here. The
last day of the final exam period is _____. Final exam dates for undergraduate
courses will not be determined until later in the semester. Final exams for
graduate courses will be held on the last day of class during the week of _____.
If you have two final exams at the same time, report the conflict to your
professors as soon as possible. Do not make any travel plans until the exam
schedule is finalized.
Also, please pay attention to notable dates such as Add/Drop, Withdrawal, etc.
For confirmation of dates or further information, please contact Susana:
[email protected]
2026-03-04