Tampa Devs April 2023

David Khourshid ยท @davidkpiano
stately.ai

The path to generally intelligent software

State machines

AI

to

Command Palette

Command palette

Find in UI

Click

Click

Click

Type

Type

Success

Success

ChatGPT & GPT-3

4

Input

Output

Input

Output

Goal

Start

Symbolic AI

Symbols

Solution

๐Ÿ“ฆ Large datasets

โ” Guesswork

๐Ÿ—ฃ No semantics

๐ŸŒ Real world is complicated

State machine AI

Wander maze

Chase Pac-Man

Run away from Pac-Man

Return to base

Lose Pac-Man

See Pac-Man

Pac-Man eats
power pill ๐Ÿ’

Power pill
wears off

Pac-Man eats
power pill

Eaten by
Pac-Man

Reach
base

State machine AI

Neural networks / Deep learning

Artificial General Intelligence

(AGI)


๐Ÿง ย  ย Mental model



๐Ÿ•˜ย  ย Perception of time

ย 


๐ŸŒˆย  ย Imagination

ย 

AGI

Artificial Intelligence

Machine learning

Deep learning

LLMs

Now

Many years away

Agent

Environment

Reward

State

Action

Reinforcement Learning

Environment

Normal mode

Scatter mode

๐Ÿ”ญ Observes state

of environment

Agent

๐Ÿ’ Takes action

Agent

Policy

Some state

Desired
state

โฌ†๏ธ

โ”

โฌ…๏ธ

โ”

โžก๏ธ

โ”

โฌ‡๏ธ

Reward

How good was
the action?

๐Ÿ‘Ž ๐Ÿ‘

Reward

Value function

๐ŸŸก + ๐Ÿ’ = ๐Ÿ‘

๐ŸŸก + ๐Ÿ‘ป = ๐Ÿ‘Ž

Model

Shortcomings

๐Ÿ“– Learning

๐Ÿ“ Planning

Needs lots of trials

Combinatorial state explosion (multidimensionality)

Simulation โ‰  real life

There will be consequences

Needs lots of trials

Sparse rewards

Explore vs. exploit

Arbitrary value function

Exploitation

Exploration

Exploration vs. exploitation

Explore 100% = learns nothing!

Exploit 100% = learns nothing!

no reward

Do nothing

Owner has treat
Tells dog "down"

ENVIRONMENT

Nothing changes

ENVIRONMENT

Exploration

reward++

Go down

Get treat

Owner has treat
Tells dog "down"

ENVIRONMENT

Owner praises dog

ENVIRONMENT

Exploration

Go down

Owner has treat
Tells dog "down"

ENVIRONMENT

Owner praises dog

ENVIRONMENT

Exploitation

๐Ÿ’ญ

Treat?

Run away

Owner has treat
Tells dog "down"

ENVIRONMENT

Undesired outcome

ENVIRONMENT

๐Ÿคฌ

Exploration

"Down"

Reward ๐Ÿฆด

Sparse rewards

Reward

Policy drives actions

Q-learning

โ† Discount rate

How can we improve RL?

By using one of the oldest AI techniques*

(state machines)

*in my opinion

Agent

Environment

Reward

State

Action

Environment = modeled as

a state machine

State = finite states

(grouped by common attributes)

Reward = how well does expected state (from state machine model) reflect actual state?

Action = shortest path
to goal state

Making apps intelligent

Input

Output

๐Ÿค– LLM

Desired goal

Making apps intelligent

Input

Output

๐Ÿค– LLM

Desired goal

Making apps intelligent

Input

๐Ÿค– Reinforcement Learning

Desired goal

๐Ÿค– LLM: goal โ†’ state

stately.ai/editor

1. Model the environment

{


  state: { ... },
  
  
  event: { type: 'someEvent', ... },
  
  
  nextState: { ... }
}

Current state

Action (event) to execute

Next state after event

1. Model the environment

Given feedback prompt, when I click good, then it takes me to submitted

Given feedback prompt, when I click bad, then it takes me to form

Given form with feedback entered, when I click submit, then it takes me to submitted

2. Determine the goal state

"Send feedback that things could have been better"
{
  value: 'submitted',
  context: {
    feedback: 'things could have been better'
  }
}

LLM (GPT-3)

3. Generate event data

{
  type: 'feedback.update',
  value: 'things could have been better'
}
{ type: 'feedback.bad' }
{ type: 'feedback.submit' }

No payload required (generic events)

LLM (GPT-3)

4. Find shortest path(s)

5. Execute!

How well can agent adapt to

changing environment? (simulated)

Learnings

๐Ÿค– LLMs unpredictable

โ†ช๏ธ State machines are really useful

๐Ÿ”ฎ RL gives us insights for AGI

๐Ÿš€ Graphs make many things possible

We can make our

apps intelligent today.

Thank you Tampa Devs!

Resources

David Khourshid ยท @davidkpiano
stately.ai

TDevs State Machines and AI

By David Khourshid

TDevs State Machines and AI

  • 420