CS 4/5789: Introduction to Reinforcement Learning
Lecture 1
Prof. Sarah Dean
MW 2:45-4pm
Zoom (110 Hollister Hall)
Agenda
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
5. Layers of Feedback


AlphaGo


Robotic Manipulation


Algorithmic Media Feeds




...

observation
action
reward
a policy maps observation to action
design policy to achieve high reward
RL is for Sequential Decision-Making
reaction
adaptation

Sequential Decision-Making
observation
action
reward


AlphaGo



Robotic Manipulation
Media Feeds
?
?
?
?
?
?


\(\theta_t-\theta_*\)
Agenda
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
5. Layers of Feedback
Logistics
- Instructor: Prof. Sarah Dean
- Head TAs: Albert Tsao and Dhruv Sreenivas
- Undergrad TAs: Caleb Biddulph, Aayush Chowdhry, Yiqi Jiang, and Sidharth Vasudev
- Contact: Ed Discussion
- Instructor Office Hours: Mondays 4-5pm on Zoom (eventually, in Gates 416A)
- TA Office Hours: See Canvas/Ed Discussion
Waitlist and Enrollment
There is high demand for this course!
Course staff do not manage waitlist and enrollment.
CS enrollment policies:
https://www.cs.cornell.edu/courseinfo/enrollment
Lecture material available on Canvas regardless.
Exams
-
Prelim on March 22 at 7:30pm
- After the drop deadline!
- Final exam during finals period, time TBD
Homework
- Five homework assignments
- problem set (math) and project (coding)
- Gradescope
- neatly written, ideally typeset with LaTeX
- 5789: Paper review assignments (after Unit 1)
- Collaboration: discussion is fine, but write your own solutions and code, and do not look at others or let others look at yours
- Late: 1 day grace period, request extensions on Ed Discussion (private post)
Participation
Participation is 5% of final grade, /20 points
-
Lecture participation = 1pt each
- Poll Everywhere: PollEv.com/sarahdean011
- Poll Everywhere: PollEv.com/sarahdean011
-
Helpful posts on Ed Discussions = 2pt each
- TA endorsement
Schedule
-
Unit 1: Fundamentals of Planning and Control (Jan-Feb)
- Markov Decision Processes, Dynamic Programming, Value and Policy Iteration, Continuous Control, Linear Quadratic Regulation
-
Unit 2: Learning in MDPs (Feb-Mar)
- Estimation, Model-based RL, Approximate Dynamic Programming, Policy Optimization
-
Unit 3: Exploration (Mar-Apr)
- Multi-armed Bandits, Contextual Bandits
-
Unit 4: Extensions and Applications (Apr-May)
- Imitation learning, state of the art examples
Prerequisites
Machine learning (e.g., CS 4780)
Basics of probability, linear algebra, and programming.
Materials
Lecture Notes and Videos*
*unless technical difficulties prevent recording
Extra Resources (not required)
RL Theory Book: https://rltheorybook.github.io/
Classic RL Book: Sutton & Barto (http://www.incompleteideas.net/book/RLbook2020.pdf)
Agenda
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
5. Layers of Feedback in RL
CS 4/5789: Lecture 1
By Sarah Dean