Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
...
observation
action
reward
a policy maps observation to action
design policy to achieve high reward
reaction
adaptation
observation
action
reward
AlphaGo
Robotic Manipulation
Media Feeds
?
?
?
?
?
?
\(\theta_t-\theta_*\)
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
There should be plenty of space!
Course staff do not manage waitlist and enrollment.
CS enrollment policies:
https://www.cs.cornell.edu/courseinfo/enrollment
Participation is 5% of final grade, /20 points
Machine learning (e.g., CS 4780)
Background in probability, linear algebra, and programming.
Lecture Slides and Notes
Extra Resources (not required)
RL Theory Book: https://rltheorybook.github.io/
Classic RL Book: Sutton & Barto (http://www.incompleteideas.net/book/RLbook2020.pdf)
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
Examples: clustering, principle component analysis (PCA)
"descriptive"
Examples: classification, regression
"predictive"
"presciptive"
Unlike other types of ML, in RL data may not be drawn "i.i.d." from some distribution
\(a_t\)
\(o_t, r_t\)
\(a_{t+1}\)
\(o_{t+1}, r_{t+1}\)
\(...\)
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
action
observation
\(a_t\)
reward
\(o_t\)
\(r_t\)
\(o_{t+1}\)
action \(a_t\)
state \(s_t\)
\(\sim \pi(s_t)\)
reward
\(r_t\sim r(s_t, a_t)\)
\(s_{t+1}\sim P(s_t, a_t)\)
Assumption on structure of observations and how they change
state
\(s_t\)
\(s_{t+1}\sim P(s_t, a_t),\quad r_t\sim r(s_t, a_t)\)
Actions can be chosen based only on current state
\(a_t \sim \pi(s_t)\)
Key Markovian Assumption:
action \(a_t\)
state \(s_t\)
\(\sim \pi(s_t)\)
reward
\(r_t\sim r(s_t, a_t)\)
\(s_{t+1}\sim P(s_t, a_t)\)
robot manipulation
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, r, P, \gamma\}\)
Goal: achieve high cumulative reward:
$$\sum_{t=0}^\infty \gamma^t r_t$$
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, r, P, \gamma\}\)
maximize \(\displaystyle \mathbb E\left[\sum_{i=1}^\infty \gamma^t r(s_t, a_t)\right]\)
s.t. \(s_{t+1}\sim P(s_t, a_t), ~~a_t\sim \pi(s_t)\)
\(\pi\)
1. What is Reinforcement Learning (RL)?
2. Logistics and Syllabus
3. Types of Machine Learning (ML)
4. Markov Decision Processes (MDP)
By Sarah Dean