HUDM5001 -Programming for Data Science Fall 2025
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
HUDM5001 -Programming for Data Science
Fall 2025
Lecture: Tue. 11:00am-12:40pm
Credits: 3
Location: GD 277
Instructional Mode: in-person
Course Overview and Learning Outcomes:
This course is an introduction to essential programming concepts, structures, and techniques for data science. Topics covered include data types, data structures, control statements, and functions, using the NumPy and Pandas libraries in the programming language Python. The course also covers version control using GitHub and database management using SQLite. Additionally, the course includes content on data visualizations and coding practices using AI.
At the end of the course, students will
(1) Be able to confidently work in an appropriate programming environment (IDE).
(2) Correctly describe basic Python language constructs and develop Python codes and write basic programs.
(3) Understand the version control concepts and work on a data science project using GitHub and Python.
(4) Create a portfolio showcasing your visualization skills.
Prerequisites:
No prerequisites. But students should have some experience working with any programming or statistical analysis software, e.g., R, SPSS, STATA, or MATLAB.
Textbook:
McKinney, W. (2017) Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2nd Edition). O’Reilly Media (available on Columbia Library EResources)
Software or Development Tools:
- Anaconda
- Spyder IDE (or other IDE tools)
- GitHub
- SQLite
Topics Covered
- Intro to GitHub
- Intro to SQL
- Python Programming
- Intro to Spyder
- variables and expressions
- data types: int, float, bool, string, list, tuple, set, dict, range
- operators
- input/output
- numpy
- pandas
- sqlite database
- control structures
- iterables and iterators
- list comprehensions
- functions
- lambda functions
- running scripts at the command line
- classes
- unit testing
- Python Data Visualization
- Matplotlib, Plotnine, Plotly
Course Schedule
The following calendar is an outline of the course topics and assignments.
(A = Programming Assignment; Q = Take-home Quiz)
Programming Assignments, Quizzes, the Midterm, and the Final Project
Programming Assignments: The programming assignments consist of focused exercises related to each week’s lectures. You are encouraged to first try to complete the homework by yourself. If you work with others, please make sure that you understand all of the work, and that your final submission is your own work. The assignments will be uploaded on GitHub no later than Tuesday and will be due the following Monday at 11:59pm, ET. The total possible points for each assignment will vary, and specific grading criteria will be provided with each assignment. Our CAs will view students’ submissions and make comments on them. Assignments are expected to be completed by due dates. Assignments turned in late will be subject to the following penalty: 10% of the total score will be deducted for each day past the due date. An assignment with the lowest score will be dropped when computing the final letter grade at the end of the semester.
Quizzes: There will be several quizzes throughout the semester that will assess your knowledge of the various topics. Quizzes are based on the Jupyter Notebooks. All quizzes are mandatory for all students to take. Importantly, the quizzes should be done in a “closed book” format, which means you should not consult any resources including notes, books, the web, devices, or other external media. Quizzes will be administered via the HonorLock system. No late policy will be applied. If you know in advance that you will miss any of the scheduled quizzes, you must make arrangements in advance with the instructor. (At least one week in advance if possible, or as soon as you are able if an unforeseen event occurs preventing you from taking the quiz.)
Midterm: The midterm will cover Weeks 1–8, and will be an in-person exam. You will be allowed one 8 ½ x 11 one-sided “cheat sheet.” You’re allowed to bring a calculator.
Presentation: The instructor will place you into a group of 3-4 students. Your group will give a presentation about one topic we’ve discussed in the course. The instructor will provide a list of presentation topics, and each group needs to choose one topic. The sign-up sheet link will be available on Canvas Announcement, starting from Wednesday, November 12 at 8 am. The presentation is a 7-min oral presentation about the chosen topic, and it should consist of explaining one or two concepts and demonstrating them with a few examples. You will use Jupyter Notebooks or PowerPoint slides for the presentation and all the group members will present together on the presentation date. Be sure to practice beforehand, and time yourselves before you give the presentation.
Final Project: You will work with other students in the same group as for the presentation. Pick a dataset that you and your group find interesting. Example sources are found below. Feel free to select your data from any other source as appropriate.
The final project should form a research question, and perform data pre-processing, data cleaning, outlier removal, and so on to sanitize your data as necessary. Explore your data to reveal interesting/useful information based on your project scenario, and create at least 2 visualizations that you find interesting/useful. Also, do at least one of the following, depending in your interests and background: (i) compute meaningful statistical quantities (e.g., means, correlations), (ii) perform a statistical test on the data (e.g., t-test), or (iii) fit a model to the data (e.g., regression).
The final report should cover the following sections: abstract, introduction, data, data processing methodology, results, and conclusions. Also, you should submit your Python codes, and make detailed annotations on the codes so that peers can easily reproduce your work. The files can be in Jupyter Notebooks or Python scripts. The maximum number of pages is limited to 10 pages (double spaced; excluding the appendix). The paper should be written as coherently as possible. More details about the final project will be announced on Canvas and GitHub.
Attendance: Regular attendance is required for this course. Missing more than three sessions will have a negative impact on your final grade.
Data
For your final project, you will analyze real data and draw meaningful conclusions with regard to your research questions. Here is a list of websites where you can find interesting data.
- kaggle
- AWS Open Data
- data.world
- ICPSR
- The Google Dataset Search
- The UCIML Repo
- The CMU data repository
- The datasets subreddit
- Tycho
- Data Portals
Delivery Mode Expectations
Students complete assigned reading before live sessions.
In-person live sessions will consist of:
- the instructor gives code demos
- students work on small and larger coding assignments, with assistance from instructor/CAs/potentially their peers
- the instructor reviews coding solutions with the class
- students submit assignments through Canvas
Note that this course is conducted in-person. In-person lectures will not be recorded using Zoom or any recording tools. However, in the event of emergencies such as COVID-19 or natural disasters, in-person lectures may be recorded via Zoom.
Electronic Submission of Assignments
All assignments must be submitted electronically through Canvas by the specified due dates and times. It is important to complete all assigned work—failure to do so will likely result in failing the class.
Class Management
Email / Communication
- Email is the best way to get in touch with the teaching staff: professor and CAs.
- Please be sure to include the course number ("HUDM5001") in your email subject line when sending email to any of the teaching staff.
Grading
Courses at Teachers College use the following grading system: A+, A, A-; B+, B, B-; C+, …, F. The symbol W is used when a student officially drops a course before its completion or if the student withdraws from an academic program of the University.
Requirement weight for final grade
1. Programming assignments (drop one) 25%
2. Quizzes 30%
3. Midterm 20%
4. Presentation 10%
5. Final Project 15%
|
If your weighted total points are … |
Your final letter grade is … |
|
[93, 100] |
A |
|
[90, 93) |
A- |
|
[83, 90) |
B |
|
[80, 83) |
B- |
|
[73, 80) |
C |
|
[70, 73) |
C- |
|
< 70 |
F |
Note that A+, B+, and C+ will be determined by the class curve and overall performance in the course.
2025-12-08