Tampa Devs April 2023
David Khourshid · @davidkpiano
stately.ai
to
Input
Output
Input
Output
Goal
Start
Wander maze
Chase Pac-Man
Run away from Pac-Man
Return to base
Lose Pac-Man
See Pac-Man
Pac-Man eats
power pill 🍒
Power pill
wears off
Pac-Man eats
power pill
Eaten by
Pac-Man
Reach
base
🧠 Mental model
🕘 Perception of time
🌈 Imagination
Artificial Intelligence
Machine learning
Deep learning
LLMs
Now
Many years away
Reward
State
Action
Normal mode
Scatter mode
🔭 Observes state
of environment
🍒 Takes action
Some state
Desired
state
⬆️
❔
⬅️
❔
➡️
❔
⬇️
How good was
the action?
Needs lots of trials
Combinatorial state explosion (multidimensionality)
Simulation ≠ real life
There will be consequences
Needs lots of trials
Sparse rewards
Explore vs. exploit
Arbitrary value function
Explore 100% = learns nothing!
Exploit 100% = learns nothing!
no reward
Do nothing
Owner has treat
Tells dog "down"
ENVIRONMENT
Nothing changes
ENVIRONMENT
reward++
Go down
Get treat
Owner has treat
Tells dog "down"
ENVIRONMENT
Owner praises dog
ENVIRONMENT
Go down
Owner has treat
Tells dog "down"
ENVIRONMENT
Owner praises dog
ENVIRONMENT
💭
Treat?
Run away
Owner has treat
Tells dog "down"
ENVIRONMENT
Undesired outcome
ENVIRONMENT
🤬
"Down"
Reward 🦴
Reward
Policy drives actions
Q-learning
← Discount rate
*in my opinion
Reward
State
Action
Environment = modeled as
a state machine
State = finite states
(grouped by common attributes)
Reward = how well does expected state (from state machine model) reflect actual state?
Action = shortest path
to goal state
Input
Output
Desired goal
Input
Output
Desired goal
Input
Desired goal
{
state: { ... },
event: { type: 'someEvent', ... },
nextState: { ... }
}
Current state
Action (event) to execute
Next state after event
Given feedback prompt, when I click good, then it takes me to submitted
Given feedback prompt, when I click bad, then it takes me to form
Given form with feedback entered, when I click submit, then it takes me to submitted
"Send feedback that things could have been better"
{
value: 'submitted',
context: {
feedback: 'things could have been better'
}
}
LLM (GPT-3)
{
type: 'feedback.update',
value: 'things could have been better'
}
{ type: 'feedback.bad' }
{ type: 'feedback.submit' }
No payload required (generic events)
LLM (GPT-3)
How well can agent adapt to
changing environment? (simulated)
David Khourshid · @davidkpiano
stately.ai