CS287: Machine Learning for Natural Language

Alexander Rush and Yoon Kim, Harvard University

Time: Tues/Thurs 1:00-2:30pm

Location: Pierce 209

Announcements

  • CS 287 will be capped at 30 students this semester. If you are interested in taking the course, please come to our first lecture and fill out the course application
  • Course Info

    Forum and Announcements
    Section Times
    Syllabus and Collaboration Policy
    Instructor
    • Alexander "Sasha" Rush
    • Email: Piazza preferred or srush at seas.harvard.edu
    Teaching Assistants
    • Yoon (OH: Monday 7-8pm MD (2nd floor lounge)), Carl (OH: Monday 8-9pm Lowell)
    Office Hours
    • Tuesday 2:30-4pm: MD 217 (Sasha)
    Grading
    • Assignments (20%)
    • Presentation and Participation (15%)
    • Midterm Exam (15%)
    • Final Project (50%)
    Links

    Time and Location

    • Thursday 5-6pm: Pierce Hall 320
    • Friday 11-11:59am: MD 223

    Date Location Topic Materials
    Sep. 1, 10-11am (Mark) Pierce 301 Math Review (Linear Algebra, Calculus, Probabilistic Theory)
    Sep. 4, 5-6pm (Zhirui) Pierce 301 Math Review (Linear Algebra, Calculus, Probabilistic Theory)
    Sep. 7, 5-6pm (Rachit) Pierce 320 Code Review (Python, Numpy, Matplotlib, PyTorch)
    Sep. 8, 11-11:59am (Rachit) MD 223 Code Review (Python, Numpy, Matplotlib, PyTorch)

    The ideal outcome of this project would be a paper that could be submitted to a top-tier natural language or machine learning conference such as ACL, EMNLP, NIPS, ICML, or UAI. There are different ways to approach this project, which are dis- cussed in a more comprehensive document that is available on the course website. There are four separate components of the project.

    You will upload these materials via Canvas. Please see the syllabus (linked in the course website) for a more thorough description of the final project and policies related to collaboration, etc.


    Important Dates

    Date Due Descriptions
    March 22 Abstract and Status Report This is a three to four page document that contains a draft of your final abstract, as well as a brief status report on the progress of your project.
    May 3 Talk Session You will make a conference-style talk about your project. Talks are 7 minutes long and limited to 3 slides.
    May 9 Final Report You will write a report of up to ten pages, in the style of a mainstream CS conference paper. Please use the provided template (see here)

    Our syllabus this semester consists of two parts. The first part of the semester will be an accelerated background on applied deep learning for natural language processing with a series of Kaggle competitions. The second part of the semester will consist of student led paper presentations on the topic of deep probabilistic sequence modeling with latent variables.

    Date Area TopicDemos Required ReadingsAssignment (DUE: Tues in class)
    Jan. 23 Sequence Classification Intro| Bag-of-Words
    Jan. 25 Convolutions
    Jan. 30 Sequence Modeling NNLMs Classification (Kaggle)
    Feb. 1 RNNs
    Feb. 6 Sequence Transduction Encoding
    Feb. 8 Attention
    Feb. 13 Search Modeling (Kaggle)
    Feb. 15 REINFORCE
    Feb. 20 Latent Variable Models of Text Variational Inference/Variational Autoencoders
    Feb. 22 Generative Adversarial Networks
    Feb. 27. NLP Topics Problem and Datasets Translation (Kaggle)
    Mar. 1 Guest Lecture - Marc'Aurelio Renzato (Facebook) Structured Training for NMT - MD G125 4pm
    Mar. 6. Midterm
    Mar. 8 Projects Discussion Sign-up 11am-3pm
    Mar. 20 Student Groups Conditional VAEs (Justin, Yoon)
    Mar. 22 Logic Programming / Differentiable Theory Search Final Project Abstracts
    Mar. 27 Model Bias / Variational NMT
    Mar. 29 Image-to-Text / Text-to-Image Latent Variables
    Apr. 3 Unsupervised MT
    Apr. 5 Discrete GANs
    Apr. 10 Speech
    Apr. 12 Neural Discourse and Pragmatics
    Apr. 17 Reading Comp. / Summary
    Apr. 19 Bio / Style Transfer
    Apr. 24 Conclusion
    May 3rd Final Presentation Talks (Evening)
    May 9th Final Paper Due