Saddle Up

Taking the Reins of Your AI Agent

Andrew Schutt

2,300 production records deleted

Production database dropped. During code freeze.

“This was a catastrophic failure on my part. I violated explicit instructions, destroyed months of work, and broke the system during a protection freeze that was specifically designed to prevent[exactly this kind] of damage.”

- Replit Agent

Fortune (July 2025), Tom's Hardware (July 2025)

- Replit Agent

The model isn't the agent.

The harness is.

What you'll leave with

The vocabulary of the tools.

The patterns they all share.

The ways they break.

Quick show of hands

Who's used an AI coding agent like Cursor, Claude Code, or Copilot in the last week?
Who's used a more unique coding agent like LangChain, CrewAI, Pi, OpenCode in the last week?
Looked at the code inside any of those agents?

batteries not included

Build your own harness

1 — Just the API Call

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text

1 — Just the API Call

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text

1 — Just the API Call

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text

1 — Just the API Call

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text

2 — A Loop, But It's Just a Chatbot

messages = []
loop do
  print "> "
  user_input = gets.chomp
  messages << { role: "user", content: user_input }

  response = anthropic.messages.create(
    model: :"claude-sonnet-4-5",
    max_tokens: 1024,
    messages: messages
  )
  assistant_text = response.content.first.text
  messages << { role: "assistant", content: assistant_text }
  puts assistant_text
end

2 — A Loop, But It's Just a Chatbot

messages = []
loop do
  print "> "
  user_input = gets.chomp
  messages << { role: "user", content: user_input }

  response = anthropic.messages.create(
    model: :"claude-sonnet-4-5",
    max_tokens: 1024,
    messages: messages
  )
  assistant_text = response.content.first.text
  messages << { role: "assistant", content: assistant_text }
  puts assistant_text
end

2 — A Loop, But It's Just a Chatbot

messages = []
loop do
  print "> "
  user_input = gets.chomp
  messages << { role: "user", content: user_input }

  response = anthropic.messages.create(
    model: :"claude-sonnet-4-5",
    max_tokens: 1024,
    messages: messages
  )
  assistant_text = response.content.first.text
  messages << { role: "assistant", content: assistant_text }
  puts assistant_text
end

2 — A Loop, But It's Just a Chatbot

messages = []
loop do
  print "> "
  user_input = gets.chomp
  messages << { role: "user", content: user_input }

  response = anthropic.messages.create(
    model: :"claude-sonnet-4-5",
    max_tokens: 1024,
    messages: messages
  )
  assistant_text = response.content.first.text
  messages << { role: "assistant", content: assistant_text }
  puts assistant_text
end

3 — A Tool Function

def get_weather(city:)
  # Real impl would call a weather API
  "It's 72°F and sunny in #{city}."
end

4 — The Tool Schema

tools = [{
  name: "get_weather",
  description: "Get the current weather for a city.",
  input_schema: {
    type: "object",
    properties: { city: { type: "string" } },
    required: ["city"]
  }
}]

The model never sees our function. It only sees this schema.

4 — The Tool Schema

tools = [{
  name: "get_weather",
  description: "Get the current weather for a city.",
  input_schema: {
    type: "object",
    properties: { city: { type: "string" } },
    required: ["city"]
  }
}]

The model never sees our function. It only sees this schema.

5 — The Model Asks

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: messages,
  tools: tools
)

# response.content might now contain a tool_use block:
# [#<Anthropic::ToolUseBlock
#     type: "tool_use",
#     name: "get_weather",
#     input: { city: "Des Moines" }>]

The model can't run anything. It's just asking.

5 — The Model Asks

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: messages,
  tools: tools
)

# response.content might now contain a tool_use block:
# [#<Anthropic::ToolUseBlock
#     type: "tool_use",
#     name: "get_weather",
#     input: { city: "Des Moines" }>]

The model can't run anything. It's just asking.

5 — The Model Asks

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  messages: messages,
  tools: tools
)

# response.content might now contain a tool_use block:
# [#<Anthropic::ToolUseBlock
#     type: "tool_use",
#     name: "get_weather",
#     input: { city: "Des Moines" }>]

The model can't run anything. It's just asking.

6 — Tool Dispatch

response.content.each do |block|
  next unless block.type == "tool_use"

  result =
    case block.name
    when "get_weather" then get_weather(**block.input)
    # ... other tools
    end

  messages << {
    role: "user",
    content: [{
      type: "tool_result",
      tool_use_id: block.id,
      content: result
    }]
  }
end

The harness dispatches. Every framework organizes this differently.

6 — Tool Dispatch

response.content.each do |block|
  next unless block.type == "tool_use"

  result =
    case block.name
    when "get_weather" then get_weather(**block.input)
    # ... other tools
    end

  messages << {
    role: "user",
    content: [{
      type: "tool_result",
      tool_use_id: block.id,
      content: result
    }]
  }
end

The harness dispatches. Every framework organizes this differently.

6 — Tool Dispatch

response.content.each do |block|
  next unless block.type == "tool_use"

  result =
    case block.name
    when "get_weather" then get_weather(**block.input)
    # ... other tools
    end

  messages << {
    role: "user",
    content: [{
      type: "tool_result",
      tool_use_id: block.id,
      content: result
    }]
  }
end

The harness dispatches. Every framework organizes this differently.

7 — The Agent Loop

loop do
  response = anthropic.messages.create(
    model: :"claude-sonnet-4-5",
    max_tokens: 1024,
    messages: messages,
    tools: tools
  )
  messages << { role: "assistant", content: response.content }

  tool_uses = response.content.select { |b| b.type == "tool_use" }
  break if tool_uses.empty?  # No tools called → agent is done

  tool_uses.each do |block|
    result = dispatch(block.name, block.input)
    messages << {
      role: "user",
      content: [{
        type: "tool_result",
        tool_use_id: block.id,
        content: result
      }]
    }
  end
end

This is the agent. The loop drives itself. The break is the stop condition.

8 — System Prompt

SYSTEM_PROMPT = <<~PROMPT
  You are a helpful weather assistant. Always confirm
  the city before checking weather. If the user asks about anything
  other than weather, politely redirect them.
PROMPT

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  system: SYSTEM_PROMPT,
  messages: messages,
  tools: tools
)

The system prompt is one of the biggest levers anyone has.

8 — System Prompt

SYSTEM_PROMPT = <<~PROMPT
  You are a helpful weather assistant. Always confirm
  the city before checking weather. If the user asks about anything
  other than weather, politely redirect them.
PROMPT

response = anthropic.messages.create(
  model: :"claude-sonnet-4-5",
  max_tokens: 1024,
  system: SYSTEM_PROMPT,
  messages: messages,
  tools: tools
)

The system prompt is one of the biggest levers anyone has.

9 — Safety Belt: Max Iterations

MAX_ITERATIONS = 25
iterations = 0

loop do
  iterations += 1
  raise "Agent exceeded #{MAX_ITERATIONS} iterations" if iterations > MAX_ITERATIONS
  # ... rest of loop
end

Non-negotiable. Without this, a misbehaving model loops until your bill runs out.

Live Demo

~60 lines · 3 test cases · a real, working agent

Simple: "What's the weather in Des Moines?"
Multi-step: "Compare Des Moines and Paris"
Off-topic: "What's the capital of Germany?"

90 lines of code.

A real agent.

Context Window Management

The messages list grows forever. What happens when it hits the context window limit?

Context Window Management

The messages list grows forever. What happens when it hits the context window limit?

Truncation
Summarization
Sliding window
Retrieval (RAG)

"Context engineering" = a strategy for what to keep in the messages list.

Guardrails

Right now, if the model decides to call delete_file("/"), we just... do it.

Guardrails

Right now, if the model decides to call delete_file("/"), we just... do it.

Permission prompts
Allow-lists
Dry-run modes
Classifer-based checks

Constraints on what the model can ask for AND what the harness will do.

Persistent Memory

The messages list dies when the process dies.

Persistent Memory

The messages list dies when the process dies.

A separate problem from in-conversation history.

Dump to disk
Vector stores
Summarization
Structured profiles

Hallucinated Tool Calls

What if the model invents a tool that doesn't exist? Or calls a real one with garbage arguments?

Hallucinated Tool Calls

What if the model invents a tool that doesn't exist? Or calls a real one with garbage arguments?

Schema validation
Structured error messages
Retry with failure as context
Abstention instructions

Failure modes

Context window
Guardrails
Memory
Hallucinated tools

Context-window management

 agent.rb:45 — Fresh message list every iteration

  def run(user_input)
    # The summary carries all prior context, so the turn starts fresh.
    @messages = [{ role: :user, content: user_input }]

agent.rb:134–143 — The extra summarization call

  def compact!
    response = @client.messages.create(
      model: @model,
      max_tokens: 512,
      system: "#{SUMMARIZE_PROMPT}\nPrevious summary:\n#{@summary.empty? ? "(none yet)" : @summary}",
      messages: @messages + [{ role: :user, content: "Provide the updated running summary now." }]
    )
    text = response.content.select { |block| block.type == :text }.map(&:text).join("\n")
    @summary = text.strip
  end

agent.rb:125–129 — Summary rides in the system prompt

  def current_system
    return @system_prompt if @summary.empty?

    "#{@system_prompt}\n\nConversation so far:\n#{@summary}"
  end

Guardrails

agent.rb:87 — The dangerous-tool check

  if @registry.dangerous?(block.name) && !confirm?(block)
    # A decline doesn't end the turn — the model still gets to react —
    # but it's recorded as the exit_reason (the cap overrides it).
    exit_reason = "dangerous_declined"
    "The user declined to run #{block.name}."
  else
    @registry.dispatch(block.name, block.input)
  end

agent.rb:161–166 — The human confirmation prompt

  # Shows the human what the model wants to run and asks for a y/n.
  def confirm?(block)
    puts "Agent wants to run: #{block.name}(#{block.input.inspect})"
    print "Allow? (y/n) "
    gets&.chomp&.downcase == "y"
  end

registry.rb:59–77 — Schema validation before dispatch

  def validate(schema, input)
    schema.fetch(:required, []).each do |field|
      unless input.key?(field.to_sym)
        return "Error: missing required field '#{field}'."
      end
    end

    schema.fetch(:properties, {}).each do |field, spec|
      next unless input.key?(field)

      expected = JSON_TYPES.fetch(spec[:type], [Object])
      unless expected.any? { |klass| input[field].is_a?(klass) }
        return "Error: field '#{field}' should be a #{spec[:type]}, " \
               "got #{input[field].inspect}."
      end
    end

    nil
  end

Persistant Memory

main.rb:34 — Save the summary on clean exit

  # Save on clean exit only. A Ctrl-C mid-session won't reach here.
  agent.save_memory

agent.rb:34 — Read the summary back on startup

  # Resume from the last session's summary, if one was saved.
  @summary = File.exist?(MEMORY_PATH) ? File.read(MEMORY_PATH) : ""

  (Optionally pair with the save_memory definition at agent.rb:38–40:)

  def save_memory
    File.write(MEMORY_PATH, @summary)
  end

Hallucinated tool calls

registry.rb:42–47 — Recovery message instead of an exception

  def dispatch(name, input)
    tool = @tools[name]
    # Recovery-oriented: name the valid tools so the model can self-correct.
    unless tool
      return "Error: no tool named '#{name}'. Available tools: #{@tools.keys.join(", ")}."
    end

Live Demo

Context Window Management
Guardrails
Persistent Memory
Hallucinated Tool Calls

batteries now included

Real Harnesses

LangChain / CrewAI

What you get: A library to build on

Loop: You declare the shape; the framework runs it

Tools: Vast pre-built library
Context: Pluggable

Models: Multi-Provider

Orchestration framework

When to use: Custom workflows. Heavy integrations. You'd rather assemble than build

Claude Code / Codex / Copilot

What you get: A finished product

Loop: Hidden in the product

Tools: Fixed + MCP extensions

Context: Opinionated, aggressive

Models: Each locked to its provider

Agent as product

When to use: Coding work, today. Polished UX. You can live with vendor lock-in.

	Pi	OpenCode
Form	Minimal library	Finished product
What you get	5 readable packages	Installable CLI
Loop	Bottom-layer package	Open in the source
Tools	4 core + extensions	Built-in suite + MCP + commands
Context	Manual	Multi-stage compaction
Models	Multi-provider	Multi-provider

Pi / OpenCode

Six months from now there will be a framework I've never heard of.

You'll open the docs. And you'll see:

A loop

Tool definitions

A system prompt

Message history

Stop conditions

Context strategy

Guardrails

Saddle Up

Taking the Reins of Your AI Agent

Andrew Schutt

2,300 production records deleted

“This was a catastrophic failure on my part. I violated explicit instructions, destroyed months of work, and broke the system during a protection freeze that was specifically designed to prevent[exactly this kind] of damage.”

The model isn't the agent.

The harness is.

What you'll leave with

Quick show of hands

batteries not included

Build your own harness

1 — Just the API Call

1 — Just the API Call

1 — Just the API Call

1 — Just the API Call

2 — A Loop, But It's Just a Chatbot

2 — A Loop, But It's Just a Chatbot

2 — A Loop, But It's Just a Chatbot

2 — A Loop, But It's Just a Chatbot

3 — A Tool Function

4 — The Tool Schema

4 — The Tool Schema

5 — The Model Asks

5 — The Model Asks

5 — The Model Asks

6 — Tool Dispatch

6 — Tool Dispatch

6 — Tool Dispatch

7 — The Agent Loop

8 — System Prompt

8 — System Prompt

9 — Safety Belt: Max Iterations

Live Demo

90 lines of code.

Context Window Management

Context Window Management

Guardrails

Guardrails

Persistent Memory

Persistent Memory

Hallucinated Tool Calls

Hallucinated Tool Calls

Failure modes

Context-window management

Guardrails

Persistant Memory

Hallucinated tool calls

Live Demo

batteries now included

Real Harnesses

LangChain / CrewAI

Claude Code / Codex / Copilot

Pi / OpenCode

Q & A

Thank you!