Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Introduction to Data Science (DS-UA 112)

Summer Semester 2021

§0.0 Purpose and design : This is a survey course. It has been designed to achieve several specific goals. First, it is supposed to introduce you to foundational concepts in the field of data science. Second, we aim to impart the 21st  century version of a liberal arts education. The 3 classical Rs of Reading, wRiting and aRithmetic are now joined by 2 new ones: data liteRacy and pRogramming. In this class, we assume that you are already somewhat familiar with the first 3 Rs and focus on the latter 2. Third, we aim to plant a variety of seeds about topics that you will encounter again in more advanced classes. Fourth, we hope to kindle a passion for data and data analysis that will last a lifetime. Finally, we also intend to impart several general purpose skills (e.g. coding in Python, the QDAFI   method,   etc.).   Overall,   the   class   is   dedicated   to   the   philosophy   of   computational empowerment. We live in transformational times. We believe that this mindset as well as these concepts are essential to a flourishing existence in the 21st century and beyond.


§1.0 Instructor:

Pascal Wallisch, PhD

Office:

6 Washington Place (Meyer Hall), Room 402

Phone:

(212) 998-8430

Email:

[email protected]

Office hours:

Friday 11.00 pm - 1.00 am (REMOTE:https://nyu.zoom.us/j/303123378)

§1. 1 TAs: All TA office hours are by appointment (via Calendly link). Numbers are zoom room IDs.

Prerna Mishra (Calendly link) Zoom:97015607439

Hörmet Yiltiz (Calendly link)

Zoom:98762396456

Stephen Spivack (Calendly link) Zoom:9882770151

Sarah Espinosa (Calendly link) Zoom:4209794454

TA email:intro2dsnyu@gmail.com

§1.2 Session times: Mo, Tu, & We 3:00 - 5:10 pm

§1.3 Session space:    Remote inhttps://nyu.zoom.us/j/98108611854

§1.4 Session content: There are 3 sessions introducing new content (both concepts and code) each week. Please attend these remotely via zoom. If these times don’t work for you (e.g. because you    live in another time zone), you can also just watch the recordings, but live is more lively, so join us if you can. Sometimes, we will have guest lecturers who will advise on professional development.

§1.5 Quora (Open forums):  Code questions Thursday at 9 pm in97015607439 Non-code questions Friday at 9 pm in4209794454 Anything goes Saturday at 9 pm in98762396456

§1.6 Prerequisites :    DS4E or equivalent

§1.7 Scope:                   0.01 to 1. Language of instruction is Python, we index from 0.

§1.8 Materials:

Concepts: “Data Science from Scratch: First Principles with Python”, by Joel Grus Linear Algebra: “Linear Algebra: Theory, Intuition, Code”, by Mike X Cohen           Coding: “Neural Data Science”, by Nylen and Wallisch

§1.9 Assignments: Are designed to foster and encourage conceptual proficiency. There is one problem set, one QDAFI response paper and one function due per theme block.


§ 2.0 Course grading : The total grade is calculated

A)  After action assessments (participation)

B)  1 Big data analysis project

C)  1 Course logistics quiz

E)  1 Exam (cumulative)

F)  6 Functions (Python)

G)  Grace (Elysium, Fuggerei)

P)  6 Problem sets

Q)  6 QDAFI response papers

S)   1 Intake survey

X)  1 Exit survey


as follows:

1% / lecture

16%

1%

20%

2% / function

10%

2% / set

2% / paper

1%

1%



15% total

16% total

01% total

20% total

12% total

10% total

12% total

12% total

01% total

01% total


Total

§ 2. 1 Grade cutoffs :


100%



A

95-100

B+

87-89.9

C+

77-79.9

D+

65-69.9

F

30-59.9

-

90-94.9

B

B-

83-86.9 80-82.9

C

C-

73-76.9 70-72.9

D

60-64.9

I

0-29.9

§ 2.2 Extra credit opportunities:

There are several extra credit opportunities in the class.

1. Problem sets: Students are expected to do 6 quizzes for full credit. Students can do an additional quiz for extra credit (that will replace the lowest score received).

2. Response papers: Students are required to do 6 papers for full credit. They can do a 7th as extra credit, which will replace the lowest paper grade.

3. Functions: Students need to write 6 functions for full credit. They can do a 7th one for extra credit, which will replace the lowest function grade.

4. MIG (Meme or Infographic): Make a meme or infographic of a course concept (e.g. PCA) for an extra 1% grade score.

5. WOW (What one wonders): Write about an interesting issue or problem that you wonder about, which might lend itself to be addressed or resolved by a data-based approach

§2.3 Attendance and Participation: You are responsible for the material covered in this course.        Thus, consistent attendance is critical, as the exam will focus on the material discussed during         lecture and labs will be crucial to clarify the subject material. Also, we assign a participation grade, which counts as 15% of the total class grade, at a rate of 1% per lecture.

So you need to attend a minimum of 15 lectures (out of 17) to get a full participation score. § 2.4 Workload: You should expect to spend about 15 hours total per week on this class – 6.5 in      lecture and lab, ~1 in office hours, ~6 doing the weekly assignments and ~1.5 doing the readings.

That’s a lot, but not unreasonable. Remember that you are going to learn many new statistical,        computational and coding concepts in this class. There are no shortcuts. Immersion is key. This      course is designed akin to developing an atomic bomb a necessarily large investment of time and resources, but with a potentially high yield and the transformational prospect of changing                 everything forever. This goes in particular for the summer version of this class.

§ 2.5 Theme blocks: The class material is grouped into 6 major theme blocks: I: Theoretical              foundations, II: Characterizing data, III: Predictions from data, IV: Inferences from data, V:               Enhanced hypothesis testing - beyond p, VI: Machine learning. As you can see, this is an                    introductory survey class that serves as a foundation for more advanced classes . Should you          already know about a particular topic, please understand that it is unlikely that this is the case for everyone. This means, we still have to cover all of these topics, as we need to onboard everyone.



§ 3.0 COURSE SCHEDULE


Week/Block

Monday

Tuesday

Wednesday

I: 07/05-07/09 Foundations

Independence Day

NO CLASS

a: Welcome

b: Probability I

a: Probability II

b: Lab I

II: 07/12-07/16 Characterization

a: Linear Algebra I

b: Linear Algebra II

a: Central Tendency

b: Dispersion

a: Lab II

b: Correlation

III: 07/19-07/23 Prediction

a: Linear Regression

b: Lab III

a: Control

b: Multiple regression

a: Model design

b: Lab IV

IV: 07/26-07/30 Inference

a: Sample & Population

b: NHST

a: Parametric tests I

b: Parametric tests II

a: Nonparametric tests

b: Lab V

V: 08/02-08/06 Beyond p

a: Resampling methods

b: Effect size & Power

a: Lab VI

b: Bayes I

a: Bayes II

b: Lab VII

VI: 08/09-08/13 Machine learning

a: Logistic Regression

b: PCA

a: Lab VIII

b: Clustering & Classification

a: Lab IX

b: Grand finale

B: Big data analysis project due date: August 19th

E: Examination (remote take home): Released August 18th, due August 20th