CS 4/5789: Lecture 1

CS 4/5789: Introduction to Reinforcement Learning

Lecture 1

Prof. Sarah Dean

MW 2:45-4pm
Zoom (110 Hollister Hall)

Agenda

1. What is Reinforcement Learning (RL)?

2. Logistics and Syllabus

3. Types of Machine Learning (ML)

4. Markov Decision Processes (MDP)

5. Layers of Feedback

AlphaGo

Robotic Manipulation

Algorithmic Media Feeds

...

observation

action

reward

a policy maps observation to action

design policy to achieve high reward

RL is for Sequential Decision-Making

reaction

adaptation

Sequential Decision-Making

observation

action

reward

AlphaGo

Robotic Manipulation

Media Feeds

\(\theta_t-\theta_*\)

Agenda

1. What is Reinforcement Learning (RL)?

2. Logistics and Syllabus

3. Types of Machine Learning (ML)

4. Markov Decision Processes (MDP)

5. Layers of Feedback

Logistics

Instructor: Prof. Sarah Dean
Head TAs: Albert Tsao and Dhruv Sreenivas
Undergrad TAs: Caleb Biddulph, Aayush Chowdhry, Yiqi Jiang, and Sidharth Vasudev

Contact: Ed Discussion
Instructor Office Hours: Mondays 4-5pm on Zoom (eventually, in Gates 416A)
TA Office Hours: See Canvas/Ed Discussion

Waitlist and Enrollment

There is high demand for this course!

Course staff do not manage waitlist and enrollment.

CS enrollment policies:
https://www.cs.cornell.edu/courseinfo/enrollment

Lecture material available on Canvas regardless.

Exams

Prelim on March 22 at 7:30pm
- After the drop deadline!
Final exam during finals period, time TBD

Homework

Five homework assignments
- problem set (math) and project (coding)
Gradescope
- neatly written, ideally typeset with LaTeX
5789: Paper review assignments (after Unit 1)
Collaboration: discussion is fine, but write your own solutions and code, and do not look at others or let others look at yours
Late: 1 day grace period, request extensions on Ed Discussion (private post)

Participation

Participation is 5% of final grade, /20 points

Lecture participation = 1pt each
- Poll Everywhere: PollEv.com/sarahdean011
Helpful posts on Ed Discussions = 2pt each
- TA endorsement

Schedule

Unit 1: Fundamentals of Planning and Control (Jan-Feb)
- Markov Decision Processes, Dynamic Programming, Value and Policy Iteration, Continuous Control, Linear Quadratic Regulation
Unit 2: Learning in MDPs (Feb-Mar)
- Estimation, Model-based RL, Approximate Dynamic Programming, Policy Optimization
Unit 3: Exploration (Mar-Apr)
- Multi-armed Bandits, Contextual Bandits
Unit 4: Extensions and Applications (Apr-May)
- Imitation learning, state of the art examples

Prerequisites

Machine learning (e.g., CS 4780)

Basics of probability, linear algebra, and programming.

Materials

Lecture Notes and Videos*
*unless technical difficulties prevent recording

Extra Resources (not required)
RL Theory Book: https://rltheorybook.github.io/
Classic RL Book: Sutton & Barto (http://www.incompleteideas.net/book/RLbook2020.pdf)

Agenda

1. What is Reinforcement Learning (RL)?

2. Logistics and Syllabus

3. Types of Machine Learning (ML)

4. Markov Decision Processes (MDP)

5. Layers of Feedback in RL