关键词 > DATA641/MSML641

DATA641/MSML641 Natural Language Processing Spring 2021


Syllabus for DATA641/MSML641

Natural Language Processing

Spring 2021


· Main text:

Jurafsky and Martin, 3rd edition (draft).

· Class announcements and discussion:

We will be using Slack. Students are strongly encouraged to read and participate in class discussions and Q&A, and you're strongly encouraged to post questions on Slack rather than e-mailing the prof or TA. This is the link to invitation:


· Schedule:

See the schedule of topics at the home page in class Canvas.

For information regarding official University closing and delays see the campus website and the weather emergency phone line (301-405-7669). In case of any unexpected changes, such as information about rescheduled exams and class assignments or cancellations due to inclement weather even though the University remains open or delayed in opening, please check class Canvas’s announcement for updates.

· Turning in homework:

For each assignment, please upload your submission to Canvas. You should have already been auto enrolled to DATA641 but reach out to us if you don’t. Each submission will generally include files assignment.py and writeup.pdf. However, please read each assignment carefully in case we specify something different. When you upload the required files, the assignment will be auto graded against a suite of private tests, which are not the same as the public tests that are given to you to test your code with.

What's the course about?

This course will introduce fundamental concepts and techniques involved in getting computers to deal more intelligently with human language. It is focused primarily on text (as opposed to speech) and will offer a grounding in core NLP methods for text processing (such as lexical analysis, sequential tagging, syntactic parsing, semantic representations, text classification, unsupervised discovery of latent structure), key ideas in the application of deep learning to language tasks, and consideration of the role of language technology in modern society.

The content of this course will be substantially similar to Computational Linguistics I, though with some adjustments geared toward longer/fewer lectures and emphasizing practical rather than theoretical concerns.


You are assumed to have taken DATA603 or MSML603, which is an introduction to machine learning and statistical pattern recognition, and therefore you should be familiar with topics covered there including but not limited to maximum likelihood estimation, Bayes' rule, k-nearest-neighbors, support vector machines, neural networks, deep learning networks, dimensionality reduction, and data clustering. You're also expected to be comfortable programming in Python.

That said, here are some useful background resources:

· Unix. Ken Church's venerable Unix for Poets is still one of the nicest concise introductions to fundamental and relevant Unix commands. SLP Section 2.4 contains some content inspired by Church's tutorial.

· Python. You are assumed to be competent and comfortable writing and debugging code in Python. The NLTK book has useful background and Python basics using NLP examples, and NLTK is widely used as a Python toolkit, though I find it more valuable for pedagogical purposes than for real-world use. For actually getting stuff done, my go-to python toolkit is spacy.io and that's what I use in this class.

· Linear algebra. I recently came across 3Blue1Brown on YouTube, and immediately became a huge fan. These are videos that provide incredibly intuitive explanations for mathematical ideas, particularly for people like me who think very visually. Even for people who are already comfortable with the mechanics of linear algebra, I very strongly recommend the 3B1B 3B1B series on linear algebra, because the explanation of core concepts is incredibly helpful in understanding what's going on with neural networks. And those lead quite nicely into their really nice videos introducing the intuitive fundamentals of deep learning in a similarly visual and intuitive way.

· Probability and statistics. You're assumed to have the basics. Again, 3B1B also has some really nice videos; I particularly recommend the video on Bayes Theorem.

· Fundamental machine learning techniques. Again, you're assumed to have machine learning as a prerequisite. But A Course in Machine Learning is useful (particularly chapters 1-5 and 7) and Jordan Boyd-Graber has devoted a bunch of time to creating a really nice machine learning playlist on YouTube.

How will class be structured?

Teaching and learning in hybrid is, of course, a challenge. Much of the class is going to be a lecture format, though I am also planning to make it interactable among both group of people: online and in-person.

A 2.5-hour long slot makes for a very long class, especially in the evening. I plan to include one 15-minute breaks per class and to make sure there's opportunity for discussion so you're not just staring at me on a screen or in-person the whole time.

Although there's no avoiding some detail work at the board in a course like this, I don't particularly like slogging through details -- I believe that detailed working-through is your job, either when you're doing the reading ahead of class (which you should make sure to do!), going through things afterwards (also a good idea!), or both. I view my primary job as making sure you understand the ideas, and that you have what you need to work through those details and understand why you're doing it.

Note that I rarely teach with slides. I expect you to take notes. If you're not in class for some reason, I expect you to get the notes you need from someone else or listen to the recorded video.

Related to that last point, I very strongly encourage you to form study groups. Your classmates are a great resource, and it will definitely improve your experience of the class.

How will the course be graded?

Course grades will be assigned as follows:

97.00+             A+

93.00-96.99    A

90.00-92.99    A-

87.00-89.99    B+

83.00-86.99    B

80.00-82.99    B-

77.00-79.99    C+

73.00-76.99    C

70.00-72.99    C-

60.00-69.99    D

0.00-59.99      F

I reserve the right to curve up the threshold (i.e., a lower point value may result in a higher grade), but I will not curve down (i.e., a higher point value will not result in a lower grade). The thresholds will be placed uniformly for the entire class.

Please note that if the final grade tabulation comes out to be 79.98, then that corresponds to a C+; I have been exact in the above specifications deliberately. I am sorry, but if I negotiate on any of these cutoffs, I then need to negotiate on the next one (e.g., if I rounded 79.95 up, then I would get harassed about 79.94). Especially for large classes, this results in chaos.

Components of the total grade are as follows:

45% Homework - These are graded on a coarse 5-point scale, corresponding to: great (you totally nailed it, and probably went above and beyond what's required); good (you did everything that's required really well); pass (you did a solid job on everything that's required, mostly well); low pass (there are some parts of the assignment you really didn't seem to get); and fail (you may have done ok on some component of the assignment, but we don't feel like you demonstrated enough

mastery of the material to consider the assignment complete).

Typically, students earn good or pass, although we love to see assignments that earn great. These are generally one-week assignments, though it's also possible

to have a half-assignment (worth 50% of a regular homework); I don't plan to give any multi-week assignments other than the final project, see below. Regardless, the amount of time given for the assignment is calibrated to the amount of work that should be involved and the amount of credit you'll get for the assignment; for example, a particular homework might be described as a two-week assignment, meaning that you'll have two weeks to do it and you'll receive two homework’s' worth of credit for it. Assignments may involve on paper exercises (e.g., walking through algorithms or calculations), hands-on programming, or analysis of data. In a typical semester there are four or five assignments, mostly during the first half of the semester. Usually, the second half of the semester, after the midterm, is focused on the final project.

Because we have a mix of people in this class, it's possible that for some homework assignment, the work may already be really familiar to you. One possibility would be for you to just treat it as an easy assignment. However, if you're interested in more of a challenge, I am open to your proposing (after reading the assignment) a more advanced variation connected with the assignment's goals. I won't give you more time or extra credit for it but if you want to do something in a more useful/interesting way I'm happy to discuss it.

I am comfortable with students working together on assignments in part or in whole, and in fact I encourage it; if you'd like to do that, please read the discussion about cooperation vs. cheating below carefully and talk with me in advance if there is any uncertainty, so that we can discuss how to make sure you stay on the right side of the university's policies on academic dishonesty.

25% Midterm exam - This will be a take-home exam, and it will not involve programming. I often have a mixture of students, some of whom are able to work most on weekdays, others who really have most of their time on weekends; therefore, I typically will hand out the exam toward the middle or end of the week and have it due at the end of the weekend. But this does not mean that you're supposed to spend all that time working on the exam. If you have mastered the content and are able to think critically about what we have covered in class, it shouldn't take any more time than typical take-home exams in other classes. I'm just giving you more wall-clock time for your flexibility.

25% Final project - This will be structured as a significant team project that will involve programming and thoughtful data analysis. It typically involves a realistic (or even real-world) problem that I will give you -- you will have some flexibility in what you do, but you won't be designing your own projects. It is extremely important that you devote significant time and attention to quality when writing up the project; don't leave the writing to the last minute, because the writeup is what gets graded. Team size should be 3-4 people and you are responsible for forming your own teams.

The project will be due on May 3rd, no extensions. I wish I could give you longer, but that only gives me two days to grade all the projects before grades need to be submitted, so it's already tight.

5% Class participation - I care enough about participation to make it part of the grade. It may be a small part, but it's definitely been known to tip the balance from a B+ grade to an A-, so please don't neglect it. Participation in class and on Piazza both counts. This is necessarily subjective, because I am judging both the quantity and quality of your participation, but the calibration is pretty straightforward. Things that push toward the top of the 5-point scale include regularly asking relevant questions, volunteering answers (even if they're wrong!), and helping make the class discussion interesting. If you show up to class prepared and contribute to the conversation in some way every couple of classes, you'll typically get 3 out of 5 points. If you are regularly sitting in class but participating rarely or not at all, you might get 1 point for showing up. If you don't show up consistently, you'll get zero.

Policy for Incomplete Work

· Late assignments. If an assignment is late by up to 24 hours, the grade will be reduced by 20%. By 48 hours, 40%. After 48 hours, no credit. Potential exceptions include the one-time Late Assignment Exception (see next bullet), urgent medical issues, family emergencies, or other valid reasons we can discuss if necessary. What's crucial is that if you do have a problem or issue, you talk to me about it as soon as possible.  I can tell you in advance that there are several common problems I will not consider as valid reasons for failing to get work in on time. These include (a) failure to manage your time properly, including being busy with another course, a piece of research, or a paper submission deadline; (b) discovering an assignment is harder than you expected it to be (see item a); and (c) losing code or data that should have been backed up, unless it's clearly someone else's fault. If you're not already backing up anything on your home computer or laptop that's important, you should be!

· Late assignment exception. Each student can ask to extend an assignment due date by 48 hours once during the semester, no questions asked, as long as the request takes place before the assignment is due. Requests should be sent to the TA with cc to the instructor. (E-mailing the request at the time the assignment is due, in place of turning in the assignment, is ok. Make sure you cc yourself, so you've got a timestamp on the request.) Note that if you are taking the Late Assignment Exception, you cannot extend the due date further with 20% or 40% penalty. If it's not turned in 48 hours after the original due date, it won't be accepted.

· 'Incomplete' as a grade. I will not issue an 'incomplete' as a course grade except for serious, valid reasons, generally in the category of serious emergencies. If you are having problems of any kind, please talk to me as soon as possible. In the event that a medical issue interferes with any class requirement, you are required to let me know in advance or as quickly as can reasonably be expected, and to provide documentation signed by a health care professional.

Other important notes

Use of electronic devices in class. Ok, well, sure, the whole class will be on an electronic device this time around. But I would appreciate it if you'd have your video on if possible, and, whether or not you're on video, if you would please not multi-task on stuff that's not related to class. Looking up something we're talking about on the fly, e.g., in order to contribute to the conversation, is related to class. Looking at your email or social media, conversing on Slack, writing code, reading a paper, etc., is not, and in fact it's simply rude.

Academic integrity policy. The Honor Code and Honor Pledge prohibit students from cheating on exams, plagiarizing papers, submitting the same paper for credit in two courses without authorization, buying papers, submitting fraudulent documents, and forging signatures. I expect you to follow the academic integrity policy, but I am exempting the class from the requirement of hand-writing and signing the honor pledge.

Cheating. What you represent as your own work must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you simply copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. If you work collaboratively with explicit permission from the instructor, you are not cheating. I strongly encourage students to help one another understand the material presented in class, in the readings, and general issues relevant to the assignments. Any student who is caught cheating will be given an F in the course and referred to the Office of Student Conduct. Please don't take that chance -- if you're having trouble understanding the material, or if you need some help clarifying what is ok to do and what is not, please let us know and we will be more than happy to help.

Accessibility and Disability Service. See https://www.counseling.umd.edu/ads for official information. Students with a documented disability should inform me within the add-drop period if academic accommodations will be needed. We will follow a process that involves meeting with me to provide with a copy of the Accommodations Letter and to obtain my signature on the Acknowledgement of Student Request form. We will plan together how accommodations will be implemented throughout the semester. To obtain the required Accommodation Letter, please contact Accessibility and Disability Service (ADS) at 301-314-7682 or [email protected].

Mental health issues. Let's face it: doing grad work can be really hard. Right now harder than ever. Sometimes students don't know that they need help, or they somehow know they're in trouble but they don't know what to do about it. What's really important for you to know is that at a big university like this one, you don't need to cope with it alone. There are many people on this campus who know how to help students in all kinds of circumstances. It's their job. Some resources you can take advantage of include the Counseling Center, in the Shoemaker Building, 301-314-7651, and Mental Health Services, in the Health Center, 301-314-8106; the Office of Student Affairs, 301-314-8430, is another place you can connect with to find help of various kinds.

If you are concerned about the behavior of another student, and in particular if you are worried that they might pose a threat to themselves or others, see this page for students concerned about another student.

Names and Pronouns. Many people might go by a name in daily life that is different from their legal name. In this classroom, we seek to refer to people by the names that they go by. Pronouns can be a way to affirm someone's gender identity, but they can also be unrelated to a person's identity. They are simply a public way in which people are referred to in place of their name (e.g. "he" or "she" or "they" or "ze" or something else). In this classroom, you are invited (if you want to) to share what pronouns you go by, and we seek to refer to people using the pronouns that they share. The pronouns someone indicates are not necessarily indicative of their gender identity. Visit trans.umd.edu to learn more.

Anti-Harassment. The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of this course. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. Harassment and hostile behavior are unwelcomed in any part of this course. This includes speech or behavior that intimidates, creates discomfort, or interferes with a person’s participation or opportunity for participation in the conference. We aim for this course to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. Please contact an instructor or staff member if you have questions or if you feel you are the victim of harassment (or otherwise witness harassment of others), or see this page for pointers to relevant resources.

Please note that as "responsible university employees" faculty are required to report any disclosure of sexual misconduct, i.e., they may not hold such disclosures in confidence. Campus Advocates Respond and Educate (CARE) to Stop Violence provides free confidential (including anonymous) advocacy and therapy services to primary and secondary survivors of sexual assault, relationship violence, stalking, and sexual harassment; they are not an official reporting entity but rather a resource that can help navigate options and provide connection to appropriate resources; their General Information contact info is (301) 314-2222 ([email protected]) with a crisis cell contact number at (301) 741-3442. The University of Maryland’s Sexual Misconduct Policy can be found at http://ocrsm.umd.edu.

Religious holidays. Please send me a list of all holidays you observe during the semester by the end of the first week of class, so they can be taken into account in the course schedule.

Emergency protocol. If the university is closed for an extended period of time, we will discuss how the course will be continued on Piazza. Please see discussion about unexpected changes above under Essentials.

Basic needs security. Any student who has difficulty affording groceries or accessing sufficient food to eat every day, or who lacks a safe and stable place to live and believes this may affect their performance in this course, is encouraged to use the resources listed below for support. Students are better served and supported when such circumstances are shared with the professor. Please consider sharing your situation with your professor who may be able to assist you in finding the appropriate resources.

· Campus Pantry: Alleviates food insecurity and provides a safe space to distribute emergency food to current UMD students. The Campus Pantry is located in the Health Center, Heilsa Room 0143 (Ground Floor), and is open each Friday during the semester from 9 a.m. - 5 p.m. Individual appointments are also available. Contact 301.314.8054 or [email protected]. For information see http://campuspantry.umd.edu/.

· Fostering Terp Success: Provides a safe and supportive campus network for students who were or are in foster care, who are homeless or at risk of being homeless, and who are without a supportive family system. For information see https://umd.edu/fostering-terp-success.

· Student Crisis Fund: For students who have an unexpected critical situation and need immediate financial support. Students will be asked for basic information to describe their circumstances of the emergency need and what other sources of funds are available. For more information, visit http://www.crisisfund.umd.edu/gethelp.html.

Use of student work. Your completed work may be used by me in this or subsequent semesters for educational purposes. Before making such use of your work, I will either get your written permission, or render the work anonymous by removing all your personal identification from the material.

Right to change information. Although every effort has been made to be complete and accurate, unforeseen circumstances arising during the semester could require the adjustment of any material given here. Consequently, given due notice to students, the instructor reserves the right to change any information on this syllabus or in other course materials. If you have concerns about any changes, please discuss them with the instructor.