关键词 > APMAE4990.001

APMA E4990.001 Topics in Applied Math: Mathematics of Data Science Spring 2023

发布时间:2023-03-09

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

APMA E4990.001 Topics in Applied Math:

Mathematics of Data Science

Spring 2023

Description: This course is an application-oriented introduction to mathematical concepts and techniques used in data science, with a balanced combination of theory, algorithms and programming implementations. A provisional list of topics includes:

Unsupervised learning/dimensionality reduction:  PCA, matrix factoriza- tion, matrix completion, clustering, problems on graphs and convex relax- ations;

(Self)supervised learning: regression (including sparse regression and other regularization techniques), classification (including logistic regression and support vector machines), compressed sensing, kernel methods, and math- ematical aspects of deep learning (including convolutional neural networks and neural models for sequential data and graphs); and

Learning with incomplete information/policiesfor interaction with the envi- ronment: “bandit” problems, Markov decision processes and mathematical aspects of reinforcement learning.

Mathematical topics will include selected tools of linear algebra (e.g., singu- lar value decomposition and spectral decomposition, randomized linear algebra), probability, duality theory, minimax theorems, Fourier transforms and convolu- tions.  We will use these tools to understand the essential character, power, and limitations of techniques used in data science.

Algorithmic topics will include selected techniques for optimization, such as linear programming, gradient descent, stochastic gradient descent, conjugate gra- dients, and alternating direction method of multipliers. Applications will empha- size data science, and the required coursework will include programming exer- cises to implement some of the relevant data science techniques.

Textbook: The main textbook for the course is Linear Algebra and Learningfrom Data by Gilbert Strang (Wellesley-Cambridge Press 2019, ISBN: 978-06921963- 8-0). I have requested to reserve a copy of this textbook at the Science & Engi- neering Library.  Freely available supplemental materials may be referenced on the weekly schedule.

Additional materials: Prof. Strang has a collection of video lectures online that cover some of the same material we will cover.1

While the following courses are more advanced than our class, they contain possible directions for further study, open problems and research projects in math- ematics of data science.

● Mathematics of Data Science, Prof. Afonso Banderia, ETH Zurich2

● Mathematical Tools for Data Science, Prof.  Carlos Ferndandez-Granda, NYU3

Prerequisites:  Students are expected to have basic knowledge in multivariable calculus (on the level of APMA E2001), linear algebra (on the level of APMA E3101), and elementary probability (on the level of IEOR E3658).  Basic pro- graming skills are required as well. As discussed below, Python is highly recom- mended for the homework assignments.

Class website and schedule: The lectures will take place on Monday and Wednes- days 11:40am- 12:55pm. The course materials and schedule will be available on Courseworks. If you have a foreseeable conflict, you must let me know as soon as possible. We will use Ed Discussion (accessible via Courseworks) for course- related announcements and discussion.

Homework: Homework will be graded strictly because its main purpose is to give you practice of presenting mathematical reasoning clearly and completely.  The grading will take into account presentation, as well as the correctness of answers: be sure to show all work in a logical order, and use complete sentences where narrative is called for.

The recommended language for the programming homework is Python; we will often provide ancillary subroutines in Python that can be used in the home- work, e.g., for data loading, cleaning and visualization. While students can also submit programming homework solutions in another programming language, such as MATLAB, no ancillary subroutines will be provided in that language, and therefore students using any language other than Python will need to develop them on their own.

Late homework will not be accepted except in the case of a documented medi- cal or similar excuse. Students are encouraged to form study groups and to discuss homework problems with each other. However, for both programming and other problems, each student must implement and present their own solutions.  Shar- ing solutions or copying of another person’s solution or other materials are not permitted.

Quizzes:  There will be three quizzes.  In preparing for them, make sure you understand the correct solutions to as many of the homework problems as possible, as well as the statements and proofs of theorems and the definitions of key terms covered in class.  Questions in the quizzes will be based on a selection of the homework questions and the material covered in the lectures. No collaboration is permitted on the quizzes.

Project Option: Students who are interested in conducting independent research may submit a research proposal for approval to the instructor.  The goal of the project is to investigate a specific application of mathematics to data science. The project should be carried out in groups of two (the instructor may approve excep- tions to this requirement in case of PhD-related research which can be performed individually). Submitting a project used for academic credit or otherwise in con-

nection with another course at Columbia or elsewhere is not allowed.

The proposal must be one page long. It should include the following informa- tion (each point will be evaluated separately):

A description of a specific question that you want to explore. The question can be experimental, i.e. whether a certain technique works for a particular application, or theoretical, whether a certain theoretical tool can be used to analyze a data-analysis method. Be as concrete as you can.

Context for your topic, including relevant bibliographic references.

An outline of what you plan to do, including two major milestones, and a precise justification of how it relates to the question that you are studying. For experimental projects, this includes a description of the dataset you plan to use.

The project report should be written in Latex and be no more than 5 pages of the main text (not including references and appendices).  The contribution, and significance of the report will be evaluated primarily based on the main text (without appendices), and so enough details must be provided in the main text to convince the reader of the report’s merits. The report should include the following sections, which will be evaluated separately:

Introduction: Describe the question have you been studying? Why is this question relevant/impactful?

State of the art: Describe the state of the art methods/results for answering this question, with relevant bibliographic references.

Methodology: How did you address the question? Did you modify existing methods? What datasets did you use? What theoretical tools did you apply? If you have deviated from your original proposal, explain why.

Results:  What results did you obtain?  Do they make sense?  Provide a thorough analysis.  Negative results are completely fine (they can be very valuable!), but please explain clearly what worked and what did not.

Discussion: What did you nd out? How does your work fit into the context of the current state of the art? Do the results suggest any other interesting questions to explore?

Grading:  Course grade will determined as follows 70% homework and 30% quizzes.  The project proposal replaces one problem set.  If the project proposal is approved, the grades for two milestone deliverables and the nal project pa- per will replace the quiz grades, and the project presentation will replace the last homework.

Extra credit may be given to students who regularly and correctly answer their classmates’ questions on Ed Discussion and otherwise contribute to class-related discussions. The grade boundaries and extra credit awards will be determined by the instructor at the end of the course.

Academic integrity:  Plagiarism and cheating will not be tolerated.  Columbia University policies in this area will be followed. Seehttps://www.collegecolumbia.edu/academics/academicintegrity