How Machines Learn
Reinforcement Learning (rewards, games, and agents)
Sources
Opening Lines
Hello, my project is reinforcement learning in computers. I am going to take Computer Learning and make it accessible and understandable for everyone in this room.
This is by no means comprehensive, but it will fill in a lot of large gaps in how we understand computers and AI.
Closing Lines
I hope you have gained some insight into the workings of computers. And, I hope you see that, in many ways, they are just like us.
More tangible ideas
How they can learn to do something
Hook
T=in url to set time
set as background
Reinforcement Learning in Computers

The Dog - The Agent (Learning entity)
The Environment is the world around the learning entity, with both rewards and punishments
Every moment, the agent sees the environment and decides what action to take. The environment gives feedback in terms of a reward signal, and the goal of the agent is to maximize this signal.

Take this teenager, Greg.
He wants to figure out how to maximize his quality of life in high school.
To do this, he has to balance grades, his social life, and his sleep.
These narrative slides are great.
- Dan
Greg's first action is to get really good grades. As a freshman in high school, Greg has a 4.0 and A's in all his classes.

But, something doesn't feel right. Greg hasn't seen his friends much since school started. The work doesn't seem worth it if he has no time away from it.
So, Greg flips a switch. His sophomore year of high school, he stops worrying so much about his grades.

He parties every weekend, stays up all night, and has the time of his life. But his grades plummet. His parents are angry at him, and his dreams of getting into a good school get farther away every day.

Greg now wants to have both. The solution? Less sleep.
Greg reasons that if he cuts down his sleep, he can go out, then come home and study, and he would have the best of both worlds.

So, Greg starts to experiment with different systems. More sleep and less party, more study and less sleep, until he lands on the best one.
Study for 3 hours every weeknight, and go out every weekend night. And make sure to get 8 hours each night.
Real World Applications



Concepts and Terminology
MDP (Markov Decision Process) - Framework describing rewards and reinforcement
Value Function - Expected reward from an action
Exploration - Experimenting to discover rewards and reinforcement
Exploitation - Using learned knowledge to get the best rewards
Learning Outcomes
My goal is to translate the language of reinforcement learning into normal speak as best as I can. I want to take ideas like MDP, Value functions and the like and find proper everyday comparisons for people to understand.
Text
Alec Anderson
By Dan Ryan
Alec Anderson
- 4