COMP 680 Statistics for Computing and Data Science Fall 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP 680 Statistics for Computing and Data Science
Fall 2022
COURSE DESCRIPTION:
In this course, students learn the fundamentals of probability and statistical inference and their applications in real-world contexts. Simulation based inference is essential for developing core skills in data science and for basic understanding of regression-based modeling. This course is designed to help students gain a foundational knowledge of inference through the simulation process using Python programming.
COURSE OVERVIEW:
This is one of the six core courses for the curriculum of Master of Data Science (MDS) program. This course covers the fundamentals and principles of probability and modern statistical inference with the focus of their applications in real-world contexts. Probability and statistics are essential tools in data science and central to fields like bioinformatics, social informatics, and machine learning. They are the foundation for quantifying uncertainty and assessing support for hypotheses and derived models, and are at the heart of areas such as efficiency analysis of algorithms and randomized algorithms. Course content includes probability and random variables, basic statistical concepts, and various methods for statistical inference and regression modeling. This course lays the groundwork for more advanced courses in the MDS program, such as Statistical Machine Learning and Deep Learning.
INSTRUCTOR INFORMATION:
Name: Su Chen
Title: Assistant Teaching Professor
Office: Duncan Hall 2051
Email: [email protected]
Office Hours: TBA
MEETING TIMES & LOCATION:
● Lectures: 3 hour / week
● Meeting time: MW 2:00-3:15pm
● Location: DCH 1070
TEACHING ASSISTANT:
Name: TBA
Email:
PREREQUISITES: College-level calculus (single-variable & multivariable). Comfortable with mathematical reasoning; and familiarity with vectors and matrices, sequences, limits, infinite series, the chain rule, and ordinary or multiple integrals.
COURSE OBJECTIVES & LEARNING OUTCOMES:
In this course, students will learn how to summarize, explore and model data to extract useful information and knowledge from data. Students completing this course will be able to:
● Identify and calculate appropriate summary and inferential statistics and infer appropriate conclusions based on data.
● Gain fluency in basic programming skills in Python with a focus on simulation-based inference and statistical modeling.
● Use applied statistical knowledge to analyze real-world data, test hypotheses, build regression models, and make scientific inference.
● Interpret inferences and modeling results in real-world contexts and communicate the findings effectively.
TEXTBOOK AND OTHER RESOURCES:
● Textbooks Recommended:
○ All of Statistics: a concise course in statistical inference
○ An Introduction to Statistical Learning
○ Computer age statistical inference
● Software: Python 3 with the standard modules from an Anaconda installation such as Numpy, Pandas and Matplotlib, as well as some statistics and machine learning modules.
● Other: Course slides, Jupyter notebooks, and assignments will be posted to the Canva course website.
COURSEWORK:
The major work of this course will be weekly quizzes and bi-weekly homework assignments.
● Homework assignments will be a mixture of mathematical derivation and programming tasks.
○ Assignments are required to be completed in Jupyter notebooks using a combination of Markdown & LaTeX for mathematical derivation, and Python code for programming. Some tutorials will be provided in class.
● Weekly quizzes will be mostly conceptual in the format of multiple choices and will be posted on Canvas.
● Midterm will be in-class, closed book written exam, and final will be a take-home exam with simulation exercise and real-world data analysis problems.
GRADING POLICIES:
● Assignment: 50%
○ Bi-weekly homework: 30%
○ Weekly quizzes: 15%
○ Course surveys: 5%
● Exams: 40%
○ Midterm exam: 20%
○ Final exam: 20%
● Class participation and attendance: 10%
ATTENDANCE AND MAKE-UP POLICIES:
Active participation in class and in groups is expected. In the event of an excused absence on a date on which an assignment is due or a class activity has been planned, the student must immediately contact the instructor to arrange a make-up time or assignment.
TENTATIVE SCHEDULE:
Date |
Topics |
Weekly Assignments |
8/22 Week 1: Probability I |
● Introduction and course logistics ● Review of probability theory ● Discrete random variables ● Continuous random variables ● Bivariate, marginal and conditional distributions |
● Install Python3 and Jupyter notebook through Anaconda ● HW1 assigned |
8/29 Week 2: Probability II |
● Multivariate random variables ● Gaussian random vectors ● Expectations, variance and covariance ● Probability Inequalities |
● Quiz 1 due |
9/5 Week 3: Statistical Inference |
● Monday: Labor day no class ● Population and random sampling ● Empirical distribution and sampling distribution ● Law of large numbers and Central Limit Theorem |
● HW1 due ● Quiz 2 due ● HW2 assigned |
9/12 Week 4: Parametric Inference and Maximum Likelihood Estimate |
●
●
● ● |
Point estimator and confidence intervals The likelihood functions MLE and its property The Delta method |
● Quiz 3 due |
9/19 Week 5: Nonparametric Inference |
•
●
● |
Parametric and nonparametric Bootstrap Bootstrap confidence interval Kernel density estimate |
● HW2 due ● Quiz 4 due ● HW3 assigned |
9/26 Week 6: Hypothesis Testing I |
●
●
● |
General framework of testing hypotheses p-values and error probabilities Common parametric tests |
● Quiz 5 due |
10/3 Week 7 Hypothesis Testing II |
●
●
●
● |
The likelihood ratio test The goodness of fit test Nonparametric tests: permutation test and A/B testing Multiple testing and FDR control |
● HW3 due ● Practice midterm exam |
10/10 Week 8: Midterm |
●
● |
Monday: midterm recess no class Wednesday: Midterm exam |
● HW4 assigned |
10/ 17 Week 9: Bayesian Inference |
● ●
● |
The Bayesian method Conjugate families and non-informative priors Empirical Bayes |
● Quiz 6 due |
10/24 Week 10: Stochastic Processes |
● ● |
Markov Chains Poisson Processes |
● ● ● |
HW4 due Quiz 7 due HW5 assigned |
10/31 Week 11: Linear Regression |
●
●
●
● |
Correlation and simple linear regression Least squares and maximum likelihood Multiple linear regression Model comparison and selection |
● |
Quiz 8 due |
11/7 Week 12: Logistic Regression and Generalized Linear Models |
● ● ● |
Logistic regression Multinomial regressio |
2022-11-09