Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP 680 Statistics for Computing and Data Science

Fall 2022


COURSE DESCRIPTION:

In this course, students learn the fundamentals of probability and statistical inference and their applications in real-world contexts. Simulation based inference is essential for developing core skills in data science and for basic understanding of regression-based modeling. This course is designed to help students gain a foundational knowledge of inference through the simulation process using Python programming.

 COURSE OVERVIEW:

This is one of the six core courses for the curriculum of Master of Data Science (MDS) program. This course covers the fundamentals and principles of probability and modern statistical inference with the focus of their applications in real-world contexts. Probability and statistics are essential tools in data science and central to fields like bioinformatics, social informatics, and machine learning. They are the foundation for quantifying uncertainty and assessing support for hypotheses and derived models, and are at the heart of areas such as efficiency analysis of algorithms and randomized algorithms. Course content includes probability and random variables, basic statistical concepts, and various methods for statistical inference and regression modeling. This course lays the groundwork for more advanced courses in the MDS program, such as Statistical Machine Learning and Deep Learning.

 INSTRUCTOR INFORMATION:

Name: Su Chen

Title: Assistant Teaching Professor

Office: Duncan Hall 2051 Email: [email protected] Office Hours: TBA

 MEETING TIMES & LOCATION:

● Lectures: 3 hour / week

● Meeting time: MW 2:00-3:15pm

● Location: DCH 1070

TEACHING ASSISTANT:

Name: TBA

Email:

PREREQUISITES: College-level calculus (single-variable & multivariable). Comfortable with mathematical reasoning; and familiarity with vectors and matrices, sequences, limits, infinite series, the chain rule, and ordinary or multiple integrals.

COURSE OBJECTIVES & LEARNING OUTCOMES:


In this course, students will learn how to summarize, explore and model data to extract useful information and knowledge from data. Students completing this course will be able to:

● Identify and calculate appropriate summary and inferential statistics and infer appropriate conclusions based on data.

● Gain fluency in basic programming skills in Python with a focus on simulation-based inference and statistical modeling.

● Use applied statistical knowledge to analyze real-world data, test hypotheses, build regression models, and make scientific inference.

● Interpret inferences and modeling results in real-world contexts and communicate the findings effectively.

 TEXTBOOK AND OTHER RESOURCES:

● Textbooks Recommended:

○ All of Statistics: a concise course in statistical inference

○ An Introduction to Statistical Learning

○ Computer age statistical inference

● Software: Python 3 with the standard modules from an Anaconda installation such as Numpy, Pandas and Matplotlib, as well as some statistics and machine learning modules.

● Other: Course slides, Jupyter notebooks, and assignments will be posted to the Canva course website.

 COURSEWORK:

The major work of this course will be weekly quizzes and bi-weekly homework assignments.

● Homework assignments will be a mixture of mathematical derivation and programming tasks.

○ Assignments are required to be completed in Jupyter notebooks using a combination of Markdown & LaTeX for mathematical derivation, and Python code for programming. Some tutorials will be provided in class.

● Weekly quizzes will be mostly conceptual in the format of multiple choices and will be posted on Canvas.

● Midterm will be in-class, closed book written exam, and final will be a take-home exam with simulation exercise and real-world data analysis problems.

 GRADING POLICIES:

● Assignment: 50%

○ Bi-weekly homework: 30%

○ Weekly quizzes: 15%

○ Course surveys: 5%

● Exams: 40%

○ Midterm exam: 20%

○ Final exam: 20%

● Class participation and attendance: 10%

 ATTENDANCE AND MAKE-UP POLICIES:

Active participation in class and in groups is expected. In the event of an excused absence on a date on which an assignment is due or a class activity has been planned, the student must immediately contact the instructor to arrange a make-up time or assignment.

 TENTATIVE SCHEDULE:

Date

Topics

Weekly Assignments

8/22 Week 1: Probability I

● Introduction and course logistics

● Review of probability theory

● Discrete random variables

● Continuous random variables

● Bivariate, marginal and conditional distributions

● Install Python3 and Jupyter notebook through Anaconda

● HW1 assigned

8/29 Week 2: Probability II

● Multivariate random variables

● Gaussian random vectors

● Expectations, variance and covariance

● Probability Inequalities

● Quiz 1 due

9/5 Week 3: Statistical Inference

● Monday: Labor day no class

● Population and random sampling

● Empirical distribution and sampling distribution

● Law of large numbers and Central Limit Theorem

● HW1 due

● Quiz 2 due

● HW2 assigned


9/12 Week 4: Parametric Inference and Maximum Likelihood Estimate

● Point estimator and confidence intervals

● The likelihood functions

● MLE and its property

● The Delta method

● Quiz 3 due

9/19 Week 5: Nonparametric Inference

· Parametric and nonparametric Bootstrap

● Bootstrap confidence interval

● Kernel density estimate

● HW2 due

● Quiz 4 due

● HW3 assigned

9/26 Week 6: Hypothesis Testing I

● General framework of testing hypotheses

● p-values and error probabilities

● Common parametric tests

● Quiz 5 due

10/3 Week 7 Hypothesis Testing II

● The likelihood ratio test

● The goodness of fit test

● Nonparametric tests: permutation test and A/B testing

● Multiple testing and FDR control

● HW3 due

● Practice midterm exam

10/10 Week 8: Midterm

● Monday: midterm recess no class

● Wednesday: Midterm exam

● HW4 assigned

10/17 Week 9: Bayesian Inference

● The Bayesian method

● Conjugate families and non-informative priors

● Empirical Bayes

● Quiz 6 due


10/24 Week 10: Stochastic Processes

● Markov Chains

● Poisson Processes

● HW4 due

● Quiz 7 due

● HW5 assigned

10/31 Week 11: Linear Regression

● Correlation and simple linear regression

● Least squares and maximum likelihood

● Multiple linear regression

● Model comparison and selection

● Quiz 8 due

11/7 Week 12: Logistic Regression and Generalized Linear Models

● Logistic regression

● Multinomial regression

● Exponential family and canonical link functions

● HW5 due

● Quiz 9 due

● HW6 assigned

11/14 Week 13: Regularized and nonlinear regression

● Ridge regression

● Lasso regression

● Nonlinear regression with splines

● Quiz 10 due

11/22 Week 14: Generalized additive models

· Generalized additive models

· Wednesday: Thanksgiving recess no class

● No assignment

11/28 Week 15: Optional Topics

· Gaussian mixture models

· EM algorithm

· Linear discriminant analysis (LDA)

· Naïve Bayes classifier

· HW6 due

12/7 – 12/13 Final Week

· Take-home final exam

· Final Exam


HONOR CODE

The Rice Honor Code is a privilege and a responsibility. The work you submit for this class is expected to be the result of your own efforts. Attempting to take credit for someone else’s work by turning it in as your own constitutes plagiarism, as defined by the Rice Honor Code. Please refer to the following honor code policies for this class. You are responsible for reading and understanding these policies.

· Homework assignments: while each student is required to submit their own solution of each assignment, you are welcome to discuss the problems with your classmates and the instructor. However, you may not send your solution to your classmates for any reason.

· Quizzes: you must complete the quizzes on your own. However, you may refer to any course materials and notes.

· Tests: test given during class will be closed book. For take home exam, you are free to use any resources including all course material and search engine. However, you are not allowed to discuss your take home exam with anyone inside or outside this class.

 DISABILITY ACCOMMODATIONS:

Students with a documented disability requiring accommodations should speak with the instructor during the first two weeks of class.

 TITLE IX:

Rice University cares about your wellbeing and safety. Rice encourages any student who has experienced an incident of harassment, pregnancy discrimination or gender discrimination or relationship, sexual, or other forms interpersonal violence to seek support through The SAFE Office. Please be aware when seeking support on campus that most employees, including myself, as the instructor, are required by Title IX to disclose all incidents of non-consensual interpersonal behaviors to Title IX professionals on campus who can act to support that student and meet their needs. For more information, please visit safe.rice.edu or

email [email protected].

 SYLLABUS CHANGE POLICY:

This syllabus and course schedule are subject to change with reasonable advance notice by the instructor.