How Machines Learn

Reinforcement Learning (rewards, games, and agents)

Sources

Opening Lines 

 

Hello, my project is reinforcement learning in computers. I am going to take Computer Learning and make it accessible and understandable for everyone in this room.

This is by no means comprehensive, but it will fill in a lot of large gaps in how we understand computers and AI. 

 

Closing Lines

I hope you have gained some insight into the workings of computers. And, I hope you see that, in many ways, they are just like us. 

More tangible ideas 

How they can learn to do something 

 

Hook 

T=in url to set time

set as background 

Reinforcement Learning in Computers

The Dog - The Agent (Learning entity)

The Environment is the world around the learning entity, with both rewards and punishments

 

Every moment, the agent sees the environment and decides what action to take. The environment gives feedback in terms of a reward signal, and the goal of the agent is to maximize this signal.

Take this teenager, Greg.

 

He wants to figure out how to maximize his quality of life in high school.

 

To do this, he has to balance grades, his social life, and his sleep. 

These narrative slides are great.
- Dan

Greg's first action is to get really good grades. As a freshman in high school, Greg has a 4.0 and A's in all his classes. 

But, something doesn't feel right. Greg hasn't seen his friends much since school started. The work doesn't seem worth it if he has no time away from it. 

So, Greg flips a switch. His sophomore year of high school, he stops worrying so much about his grades. 

He parties every weekend, stays up all night, and has the time of his life. But his grades plummet. His parents are angry at him, and his dreams of getting into a good school get farther away every day. 

Greg now wants to have both. The solution? Less sleep.

Greg reasons that if he cuts down his sleep, he can go out, then come home and study, and he would have the best of both worlds. 

So, Greg starts to experiment with different systems. More sleep and less party, more study and less sleep, until he lands on the best one.

Study for 3 hours every weeknight, and go out every weekend night. And make sure to get 8 hours each night. 

Real World Applications 

Concepts and Terminology 

MDP (Markov Decision Process) - Framework describing rewards and reinforcement

 

Value Function - Expected reward from an action

 

Exploration - Experimenting to discover rewards and reinforcement 

 

Exploitation - Using learned knowledge to get the best rewards 

 

 

Learning Outcomes

My goal is to translate the language of reinforcement learning into normal speak as best as I can. I want to take ideas like MDP, Value functions and the like and find proper everyday comparisons for people to understand. 

Text

Alec Anderson

By Dan Ryan

Alec Anderson

  • 4