Tampa Devs April 2023
David Khourshid · @davidkpiano
stately.ai
The path to generally intelligent software
State machines
AI
to
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10241093/OdzQX.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10245359/CleanShot_2023-02-23_at_00.21.20_2x.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10245361/CleanShot_2023-02-23_at_00.21.33_2x.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10245364/CleanShot_2023-02-23_at_00.23.25_2x.png)
Command Palette
Command palette
Find in UI
Click
Click
Click
Type
Type
Success
Success
ChatGPT & GPT-3
4
Input
Output
Input
Output
Goal
Start
Symbolic AI
Symbols
Solution
📦 Large datasets
❔ Guesswork
🗣 No semantics
🌍 Real world is complicated
State machine AI
Wander maze
Chase Pac-Man
Run away from Pac-Man
Return to base
Lose Pac-Man
See Pac-Man
Pac-Man eats
power pill 🍒
Power pill
wears off
Pac-Man eats
power pill
Eaten by
Pac-Man
Reach
base
State machine AI
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10244483/68747470733a2f2f6269746275636b65742e6f72672f736861756e65772f7061632d6d616e2f7261772f3437313438303032333361392f73686f74732f6c6561726e2e706e67.png)
Neural networks / Deep learning
Artificial General Intelligence
(AGI)
🧠 Mental model
🕘 Perception of time
🌈 Imagination
AGI
Artificial Intelligence
Machine learning
Deep learning
LLMs
Now
Many years away
Agent
Environment
Reward
State
Action
Reinforcement Learning
Environment
![](https://media3.giphy.com/media/cyMqOH8rjgDHG/giphy.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10241657/CleanShot_2023-02-21_at_22.46.48_2x.png)
Normal mode
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10241658/CleanShot_2023-02-21_at_22.47.00_2x.png)
Scatter mode
🔭 Observes state
of environment
Agent
![](https://media4.giphy.com/media/gYWeVOiMmbg3kzCTq5/giphy.gif)
🍒 Takes action
Agent
![](https://media4.giphy.com/media/gYWeVOiMmbg3kzCTq5/giphy.gif)
![](https://media4.giphy.com/media/hkqefnFjn2MWVl6xvq/giphy.gif)
![](https://media3.giphy.com/media/go3pCPP4899Jd3xb4p/giphy.gif)
Policy
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10242408/pac-man-path-screen-4.gif)
Some state
Desired
state
⬆️
❔
⬅️
❔
➡️
❔
⬇️
Reward
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10241672/4566657653_694d28a38a_b.jpeg)
How good was
the action?
👎 👍
Reward
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10241672/4566657653_694d28a38a_b.jpeg)
Value function
🟡 + 🍒 = 👍
🟡 + 👻 = 👎
Model
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10243369/68747470733a2f2f6269746275636b65742e6f72672f736861756e65772f7061632d6d616e2f7261772f3437313438303032333361392f73686f74732f6d6f6e74616765322e706e67.png)
Shortcomings
📖 Learning
📍 Planning
Needs lots of trials
Combinatorial state explosion (multidimensionality)
Simulation ≠ real life
There will be consequences
Needs lots of trials
Sparse rewards
Explore vs. exploit
Arbitrary value function
Exploitation
Exploration
Exploration vs. exploitation
Explore 100% = learns nothing!
Exploit 100% = learns nothing!
no reward
Do nothing
Owner has treat
Tells dog "down"
ENVIRONMENT
Nothing changes
ENVIRONMENT
Exploration
reward++
Go down
Get treat
Owner has treat
Tells dog "down"
ENVIRONMENT
Owner praises dog
ENVIRONMENT
Exploration
Go down
Owner has treat
Tells dog "down"
ENVIRONMENT
Owner praises dog
ENVIRONMENT
Exploitation
💭
Treat?
Run away
Owner has treat
Tells dog "down"
ENVIRONMENT
Undesired outcome
ENVIRONMENT
🤬
Exploration
"Down"
Reward 🦴
Sparse rewards
Reward
Policy drives actions
Q-learning
← Discount rate
How can we improve RL?
By using one of the oldest AI techniques*
(state machines)
*in my opinion
Agent
Environment
Reward
State
Action
Environment = modeled as
a state machine
State = finite states
(grouped by common attributes)
Reward = how well does expected state (from state machine model) reflect actual state?
Action = shortest path
to goal state
Making apps intelligent
Input
Output
🤖 LLM
Desired goal
Making apps intelligent
Input
Output
🤖 LLM
Desired goal
Making apps intelligent
Input
🤖 Reinforcement Learning
Desired goal
🤖 LLM: goal → state
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10246501/CleanShot_2023-02-23_at_11.58.46_2x.png)
stately.ai/editor
1. Model the environment
{
state: { ... },
event: { type: 'someEvent', ... },
nextState: { ... }
}
Current state
Action (event) to execute
Next state after event
1. Model the environment
Given feedback prompt, when I click good, then it takes me to submitted
Given feedback prompt, when I click bad, then it takes me to form
Given form with feedback entered, when I click submit, then it takes me to submitted
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10246501/CleanShot_2023-02-23_at_11.58.46_2x.png)
2. Determine the goal state
"Send feedback that things could have been better"
{
value: 'submitted',
context: {
feedback: 'things could have been better'
}
}
LLM (GPT-3)
3. Generate event data
{
type: 'feedback.update',
value: 'things could have been better'
}
{ type: 'feedback.bad' }
{ type: 'feedback.submit' }
No payload required (generic events)
LLM (GPT-3)
4. Find shortest path(s)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10243272/CleanShot_2023-02-22_at_12.29.00_2x.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10243273/CleanShot_2023-02-22_at_12.29.52_2x.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/10243274/CleanShot_2023-02-22_at_12.29.23_2x.png)
5. Execute!
How well can agent adapt to
changing environment? (simulated)
Learnings
🤖 LLMs unpredictable
↪️ State machines are really useful
🔮 RL gives us insights for AGI
🚀 Graphs make many things possible
We can make our
apps intelligent today.
Thank you Tampa Devs!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/174419/images/9674899/logomark-black-nobg.png)
Resources
David Khourshid · @davidkpiano
stately.ai
TDevs State Machines and AI
By David Khourshid
TDevs State Machines and AI
- 942