Many examples taken from David Silver's UCL course
https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDMOYHWgPebj2MfCFzFObQ
*often sequential and preferably optimal.
Andrew Barto
Richard Sutton
These pictures are old, yes, even by Academia standards.
Model
Input
Ground truth
Minimize the Loss.
as the maximization of cumulative rewards.
What if,
S
D