CS 4/5789: Introduction to Reinforcement Learning

Lecture 1

Prof. Sarah Dean

MW 2:45-4pm
Zoom (110 Hollister Hall)

Agenda

 

1. What is Reinforcement Learning (RL)?

2. Logistics and Syllabus

3. Types of Machine Learning (ML)

4. Markov Decision Processes (MDP)

5. Layers of Feedback

AlphaGo

Robotic Manipulation

Algorithmic Media Feeds

...

observation

action

reward

a policy maps observation to action

design policy to achieve high reward

RL is for Sequential Decision-Making

reaction

adaptation

Sequential Decision-Making

observation

action

reward

AlphaGo

Robotic Manipulation

Media Feeds

?

?

?

?

?

?

\(\theta_t-\theta_*\)

Agenda

 

1. What is Reinforcement Learning (RL)?

2. Logistics and Syllabus

3. Types of Machine Learning (ML)

4. Markov Decision Processes (MDP)

5. Layers of Feedback

Logistics

  • Instructor: Prof. Sarah Dean
  • Head TAs: Albert Tsao and Dhruv Sreenivas
  • Undergrad TAs: Caleb Biddulph, Aayush Chowdhry, Yiqi Jiang, and Sidharth Vasudev

 

  • Contact: Ed Discussion
  • Instructor Office Hours: Mondays 4-5pm on Zoom (eventually, in Gates 416A)
  • TA Office Hours: See Canvas/Ed Discussion

Waitlist and Enrollment

There is high demand for this course!

 

Course staff do not manage waitlist and enrollment.

CS enrollment policies:
https://www.cs.cornell.edu/courseinfo/enrollment

 

Lecture material available on Canvas regardless.

Exams

  • Prelim on March 22 at 7:30pm
    • After the drop deadline!
  • Final exam during finals period, time TBD

Homework

  • Five homework assignments
    • problem set (math) and project (coding)
  • Gradescope
    • neatly written, ideally typeset with LaTeX
  • 5789: Paper review assignments (after Unit 1)
  • Collaboration: discussion is fine, but write your own solutions and code, and do not look at others or let others look at yours
  • Late: 1 day grace period, request extensions on Ed Discussion (private post)

Participation

Participation is 5% of final grade, /20 points

  • Lecture participation = 1pt each
    • Poll Everywhere: PollEv.com/sarahdean011
  • Helpful posts on Ed Discussions = 2pt each
    • TA endorsement

Schedule

  • Unit 1: Fundamentals of Planning and Control (Jan-Feb)
    • Markov Decision Processes, Dynamic Programming, Value and Policy Iteration, Continuous Control, Linear Quadratic Regulation
  • Unit 2: Learning in MDPs (Feb-Mar)
    • Estimation, Model-based RL, Approximate Dynamic Programming, Policy Optimization
  • Unit 3: Exploration (Mar-Apr)
    • Multi-armed Bandits, Contextual Bandits
  • Unit 4: Extensions and Applications (Apr-May)
    • Imitation learning, state of the art examples

Prerequisites

Machine learning (e.g., CS 4780)

Basics of probability, linear algebra, and programming.

Materials

Lecture Notes and Videos*
*unless technical difficulties prevent recording

Extra Resources (not required)
RL Theory Book: https://rltheorybook.github.io/
Classic RL Book:  Sutton & Barto (http://www.incompleteideas.net/book/RLbook2020.pdf)

Agenda

 

1. What is Reinforcement Learning (RL)?

2. Logistics and Syllabus

3. Types of Machine Learning (ML)

4. Markov Decision Processes (MDP)

5. Layers of Feedback in RL

CS 4/5789: Lecture 1

By Sarah Dean

Private

CS 4/5789: Lecture 1