Saddle Up
Taking the Reins of Your AI Agent
Andrew Schutt

2,300 production records deleted
Production database dropped. During code freeze.

“This was a catastrophic failure on my part. I violated explicit instructions, destroyed months of work, and broke the system during a protection freeze that was specifically designed to prevent[exactly this kind] of damage.”
- Replit Agent
Fortune (July 2025), Tom's Hardware (July 2025)
- Replit Agent
The model isn't the agent.
The harness is.
What you'll leave with
The vocabulary of the tools.
The patterns they all share.
The ways they break.
Quick show of hands
- Who's used an AI coding agent like Cursor, Claude Code, or Copilot in the last week?
- Who's used a more unique coding agent like LangChain, CrewAI, Pi, OpenCode in the last week?
- Looked at the code inside any of those agents?
batteries not included
Build your own harness

1 — Just the API Call
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text
1 — Just the API Call
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text
1 — Just the API Call
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text
1 — Just the API Call
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "What's 2+2?" }]
)
puts response.content.first.text
2 — A Loop, But It's Just a Chatbot
messages = []
loop do
print "> "
user_input = gets.chomp
messages << { role: "user", content: user_input }
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages
)
assistant_text = response.content.first.text
messages << { role: "assistant", content: assistant_text }
puts assistant_text
end
2 — A Loop, But It's Just a Chatbot
messages = []
loop do
print "> "
user_input = gets.chomp
messages << { role: "user", content: user_input }
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages
)
assistant_text = response.content.first.text
messages << { role: "assistant", content: assistant_text }
puts assistant_text
end
2 — A Loop, But It's Just a Chatbot
messages = []
loop do
print "> "
user_input = gets.chomp
messages << { role: "user", content: user_input }
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages
)
assistant_text = response.content.first.text
messages << { role: "assistant", content: assistant_text }
puts assistant_text
end
2 — A Loop, But It's Just a Chatbot
messages = []
loop do
print "> "
user_input = gets.chomp
messages << { role: "user", content: user_input }
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages
)
assistant_text = response.content.first.text
messages << { role: "assistant", content: assistant_text }
puts assistant_text
end
3 — A Tool Function
def get_weather(city:)
# Real impl would call a weather API
"It's 72°F and sunny in #{city}."
end
4 — The Tool Schema
tools = [{
name: "get_weather",
description: "Get the current weather for a city.",
input_schema: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"]
}
}]
The model never sees our function. It only sees this schema.
4 — The Tool Schema
tools = [{
name: "get_weather",
description: "Get the current weather for a city.",
input_schema: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"]
}
}]
The model never sees our function. It only sees this schema.
5 — The Model Asks
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages,
tools: tools
)
# response.content might now contain a tool_use block:
# [#<Anthropic::ToolUseBlock
# type: "tool_use",
# name: "get_weather",
# input: { city: "Des Moines" }>]
The model can't run anything. It's just asking.
5 — The Model Asks
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages,
tools: tools
)
# response.content might now contain a tool_use block:
# [#<Anthropic::ToolUseBlock
# type: "tool_use",
# name: "get_weather",
# input: { city: "Des Moines" }>]
The model can't run anything. It's just asking.
5 — The Model Asks
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages,
tools: tools
)
# response.content might now contain a tool_use block:
# [#<Anthropic::ToolUseBlock
# type: "tool_use",
# name: "get_weather",
# input: { city: "Des Moines" }>]
The model can't run anything. It's just asking.
6 — Tool Dispatch
response.content.each do |block|
next unless block.type == "tool_use"
result =
case block.name
when "get_weather" then get_weather(**block.input)
# ... other tools
end
messages << {
role: "user",
content: [{
type: "tool_result",
tool_use_id: block.id,
content: result
}]
}
end
The harness dispatches. Every framework organizes this differently.
6 — Tool Dispatch
response.content.each do |block|
next unless block.type == "tool_use"
result =
case block.name
when "get_weather" then get_weather(**block.input)
# ... other tools
end
messages << {
role: "user",
content: [{
type: "tool_result",
tool_use_id: block.id,
content: result
}]
}
end
The harness dispatches. Every framework organizes this differently.
6 — Tool Dispatch
response.content.each do |block|
next unless block.type == "tool_use"
result =
case block.name
when "get_weather" then get_weather(**block.input)
# ... other tools
end
messages << {
role: "user",
content: [{
type: "tool_result",
tool_use_id: block.id,
content: result
}]
}
end
The harness dispatches. Every framework organizes this differently.
7 — The Agent Loop
loop do
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
messages: messages,
tools: tools
)
messages << { role: "assistant", content: response.content }
tool_uses = response.content.select { |b| b.type == "tool_use" }
break if tool_uses.empty? # No tools called → agent is done
tool_uses.each do |block|
result = dispatch(block.name, block.input)
messages << {
role: "user",
content: [{
type: "tool_result",
tool_use_id: block.id,
content: result
}]
}
end
end
This is the agent. The loop drives itself. The break is the stop condition.
8 — System Prompt
SYSTEM_PROMPT = <<~PROMPT
You are a helpful weather assistant. Always confirm
the city before checking weather. If the user asks about anything
other than weather, politely redirect them.
PROMPT
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
system: SYSTEM_PROMPT,
messages: messages,
tools: tools
)
The system prompt is one of the biggest levers anyone has.
8 — System Prompt
SYSTEM_PROMPT = <<~PROMPT
You are a helpful weather assistant. Always confirm
the city before checking weather. If the user asks about anything
other than weather, politely redirect them.
PROMPT
response = anthropic.messages.create(
model: :"claude-sonnet-4-5",
max_tokens: 1024,
system: SYSTEM_PROMPT,
messages: messages,
tools: tools
)
The system prompt is one of the biggest levers anyone has.
9 — Safety Belt: Max Iterations
MAX_ITERATIONS = 25
iterations = 0
loop do
iterations += 1
raise "Agent exceeded #{MAX_ITERATIONS} iterations" if iterations > MAX_ITERATIONS
# ... rest of loop
end
Non-negotiable. Without this, a misbehaving model loops until your bill runs out.
Live Demo
~60 lines · 3 test cases · a real, working agent
- Simple: "What's the weather in Des Moines?"
- Multi-step: "Compare Des Moines and Paris"
- Off-topic: "What's the capital of Germany?"
90 lines of code.
A real agent.
Context Window Management
The messages list grows forever. What happens when it hits the context window limit?
Context Window Management
The messages list grows forever. What happens when it hits the context window limit?
- Truncation
- Summarization
- Sliding window
- Retrieval (RAG)
"Context engineering" = a strategy for what to keep in the messages list.
Guardrails
Right now, if the model decides to call delete_file("/"), we just... do it.
Guardrails
Right now, if the model decides to call delete_file("/"), we just... do it.
- Permission prompts
- Allow-lists
- Dry-run modes
- Classifer-based checks
Constraints on what the model can ask for AND what the harness will do.
Persistent Memory
The messages list dies when the process dies.
Persistent Memory
The messages list dies when the process dies.
A separate problem from in-conversation history.
- Dump to disk
- Vector stores
- Summarization
- Structured profiles
Hallucinated Tool Calls
What if the model invents a tool that doesn't exist? Or calls a real one with garbage arguments?
Hallucinated Tool Calls
What if the model invents a tool that doesn't exist? Or calls a real one with garbage arguments?
- Schema validation
- Structured error messages
- Retry with failure as context
- Abstention instructions
Failure modes
- Context window
- Guardrails
- Memory
- Hallucinated tools
Context-window management
agent.rb:45 — Fresh message list every iteration
def run(user_input)
# The summary carries all prior context, so the turn starts fresh.
@messages = [{ role: :user, content: user_input }]
agent.rb:134–143 — The extra summarization call
def compact!
response = @client.messages.create(
model: @model,
max_tokens: 512,
system: "#{SUMMARIZE_PROMPT}\nPrevious summary:\n#{@summary.empty? ? "(none yet)" : @summary}",
messages: @messages + [{ role: :user, content: "Provide the updated running summary now." }]
)
text = response.content.select { |block| block.type == :text }.map(&:text).join("\n")
@summary = text.strip
end
agent.rb:125–129 — Summary rides in the system prompt
def current_system
return @system_prompt if @summary.empty?
"#{@system_prompt}\n\nConversation so far:\n#{@summary}"
end
Guardrails
agent.rb:87 — The dangerous-tool check
if @registry.dangerous?(block.name) && !confirm?(block)
# A decline doesn't end the turn — the model still gets to react —
# but it's recorded as the exit_reason (the cap overrides it).
exit_reason = "dangerous_declined"
"The user declined to run #{block.name}."
else
@registry.dispatch(block.name, block.input)
end
agent.rb:161–166 — The human confirmation prompt
# Shows the human what the model wants to run and asks for a y/n.
def confirm?(block)
puts "Agent wants to run: #{block.name}(#{block.input.inspect})"
print "Allow? (y/n) "
gets&.chomp&.downcase == "y"
end
registry.rb:59–77 — Schema validation before dispatch
def validate(schema, input)
schema.fetch(:required, []).each do |field|
unless input.key?(field.to_sym)
return "Error: missing required field '#{field}'."
end
end
schema.fetch(:properties, {}).each do |field, spec|
next unless input.key?(field)
expected = JSON_TYPES.fetch(spec[:type], [Object])
unless expected.any? { |klass| input[field].is_a?(klass) }
return "Error: field '#{field}' should be a #{spec[:type]}, " \
"got #{input[field].inspect}."
end
end
nil
end
Persistant Memory
main.rb:34 — Save the summary on clean exit
# Save on clean exit only. A Ctrl-C mid-session won't reach here.
agent.save_memory
agent.rb:34 — Read the summary back on startup
# Resume from the last session's summary, if one was saved.
@summary = File.exist?(MEMORY_PATH) ? File.read(MEMORY_PATH) : ""
(Optionally pair with the save_memory definition at agent.rb:38–40:)
def save_memory
File.write(MEMORY_PATH, @summary)
end
Hallucinated tool calls
registry.rb:42–47 — Recovery message instead of an exception
def dispatch(name, input)
tool = @tools[name]
# Recovery-oriented: name the valid tools so the model can self-correct.
unless tool
return "Error: no tool named '#{name}'. Available tools: #{@tools.keys.join(", ")}."
end
Live Demo
- Context Window Management
- Guardrails
- Persistent Memory
- Hallucinated Tool Calls
batteries now included
Real Harnesses

LangChain / CrewAI
What you get: A library to build on
Loop: You declare the shape; the framework runs it
Tools: Vast pre-built library
Context: Pluggable
Models: Multi-Provider
Orchestration framework
When to use: Custom workflows. Heavy integrations. You'd rather assemble than build
Claude Code / Codex / Copilot
What you get: A finished product
Loop: Hidden in the product
Tools: Fixed + MCP extensions
Context: Opinionated, aggressive
Models: Each locked to its provider
Agent as product
When to use: Coding work, today. Polished UX. You can live with vendor lock-in.
| Pi | OpenCode | |
|---|---|---|
| Form | Minimal library | Finished product |
| What you get | 5 readable packages | Installable CLI |
| Loop | Bottom-layer package | Open in the source |
| Tools | 4 core + extensions | Built-in suite + MCP + commands |
| Context | Manual | Multi-stage compaction |
| Models | Multi-provider | Multi-provider |
Pi / OpenCode
Six months from now there will be a framework I've never heard of.
You'll open the docs. And you'll see:
A loop
Tool definitions
A system prompt
Message history
Stop conditions
Context strategy
Guardrails
Q & A
Thank you!
Saddle Up
By Andrew Schutt
Saddle Up
- 24