Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP 680 Statistics for Computing and Data Science

Fall 2022

COURSE DESCRIPTION:

In this course, students learn the fundamentals of probability and statistical inference and their applications in real-world contexts. Simulation based inference is essential for developing core skills in data science and for basic understanding of regression-based modeling. This course is designed to help students gain a foundational knowledge of inference through the simulation process using Python programming.

COURSE OVERVIEW:

This is one of the six core courses for the curriculum of Master of Data Science (MDS) program. This course covers the fundamentals and principles of probability and modern statistical inference with the focus of their applications in real-world contexts. Probability and statistics are essential tools in data science and central to fields like bioinformatics, social informatics, and machine learning. They are the foundation for quantifying uncertainty and assessing support for hypotheses and derived models, and are at the heart of areas such as efficiency analysis of algorithms and randomized algorithms. Course content includes probability and random variables, basic statistical concepts, and various methods for statistical inference and regression  modeling. This course lays the groundwork for more advanced courses in the MDS program, such as Statistical Machine Learning and Deep Learning.

INSTRUCTOR INFORMATION:

Name: Su Chen

Title: Assistant Teaching Professor

Office: Duncan Hall 2051

Email: [email protected]

Office Hours: TBA

MEETING TIMES & LOCATION:

Lectures: 3 hour / week

Meeting time: MW 2:00-3:15pm

Location: DCH 1070

TEACHING ASSISTANT:

Name: TBA

Email:

PREREQUISITES: College-level calculus (single-variable & multivariable). Comfortable with mathematical reasoning; and familiarity with vectors and matrices, sequences, limits, infinite      series, the chain rule, and ordinary or multiple integrals.

COURSE OBJECTIVES & LEARNING OUTCOMES:

In this course, students will learn how to summarize, explore and model data to extract useful information and knowledge from data. Students completing this course will be able to:

●    Identify and calculate appropriate summary and inferential statistics and infer appropriate conclusions based on data.

●    Gain fluency in basic programming skills in Python with a focus on simulation-based inference and statistical modeling.

●   Use applied statistical knowledge to analyze real-world data, test hypotheses, build regression models, and make scientific inference.

●   Interpret inferences and modeling results in real-world contexts and communicate the findings effectively.

TEXTBOOK AND OTHER RESOURCES:

Textbooks Recommended:

All of Statistics: a concise course in statistical inference

An Introduction to Statistical Learning

Computer age statistical inference

Software: Python 3 with the standard modules from an Anaconda installation such as       Numpy, Pandas and Matplotlib, as well as some statistics and machine learning modules.

Other: Course slides, Jupyter notebooks, and assignments will be posted to the Canva course website.

COURSEWORK:

The major work of this course will be weekly quizzes and bi-weekly homework assignments.

●   Homework assignments will be a mixture of mathematical derivation and programming tasks.

○   Assignments are required to be completed in Jupyter notebooks using a           combination of Markdown & LaTeX for mathematical derivation, and Python code for programming. Some tutorials will be provided in class.

●   Weekly quizzes will be mostly conceptual in the format of multiple choices and will be posted on Canvas.

●   Midterm will be in-class, closed book written exam, and final will be a take-home exam with simulation exercise and real-world data analysis problems.

GRADING POLICIES:

Assignment: 50%

Bi-weekly homework: 30%

Weekly quizzes: 15%

Course surveys: 5%

●   Exams: 40%

Midterm exam: 20%

Final exam: 20%

Class participation and attendance: 10%

ATTENDANCE AND MAKE-UP POLICIES:

Active participation in class and in groups is expected. In the event of an excused absence on a date on which an assignment is due or a class activity has been planned, the student must         immediately contact the instructor to arrange a make-up time or assignment.

TENTATIVE SCHEDULE:

Date

Topics

Weekly Assignments

8/22 Week 1: Probability I

Introduction and course logistics

Review of probability theory

Discrete random variables

Continuous random variables

Bivariate, marginal and conditional distributions

Install Python3 and Jupyter notebook   through Anaconda

HW1 assigned

8/29 Week 2: Probability II

Multivariate random variables

Gaussian random vectors

Expectations, variance and covariance

Probability Inequalities

Quiz 1 due

9/5 Week 3: Statistical

Inference

Monday: Labor day no class

Population and    random sampling

Empirical distribution and sampling distribution

Law of large numbers and Central Limit Theorem

HW1 due

Quiz 2 due

HW2 assigned

9/12 Week 4: Parametric

Inference and Maximum Likelihood Estimate

Point estimator and confidence intervals The likelihood functions

MLE and its property The Delta method

Quiz 3 due

9/19 Week 5:                  Nonparametric Inference

Parametric and nonparametric Bootstrap                   Bootstrap confidence interval

Kernel density

estimate

HW2 due

Quiz 4 due

HW3 assigned

9/26 Week 6: Hypothesis Testing I

General framework of testing hypotheses      p-values and error       probabilities                Common parametric   tests

Quiz 5 due

10/3 Week 7 Hypothesis Testing II

The likelihood ratio test

The goodness of fit test                             Nonparametric tests: permutation test and A/B testing               Multiple testing and FDR control

HW3 due

Practice midterm exam

10/10 Week 8: Midterm

Monday: midterm recess no class

Wednesday: Midterm

exam

HW4 assigned

10/ 17 Week 9: Bayesian

Inference

The Bayesian method Conjugate families     and non-informative priors

Empirical Bayes

Quiz 6 due

10/24 Week 10: Stochastic Processes

Markov Chains

Poisson Processes

HW4 due

Quiz 7 due

HW5 assigned

10/31 Week 11: Linear

Regression

Correlation and simple linear regression          Least squares and        maximum likelihood   Multiple linear regression

Model comparison

and selection

Quiz 8 due

11/7 Week 12: Logistic       Regression and Generalized

Linear Models

Logistic regression

Multinomial

regressio