MCP vs. the command line
a head-to-head evaluation of agent tool integration patterns

2026-04-22 AI Engineer Miami
What are we talking about?
- What is MCP? What's a skill?
- Why are we talking about this?
- An experiment: MCP vs. skills
- What the data says
- What this means for you
What is MCP?
- 97 million monthly SDK downloads
- 17,000+ active MCP servers

What is a Skill?

The debate
Simon Willison, October 2025


Pieter Levels, March 2026

Vibes vs. data
Audience vote

Why GitHub?
The test bed

25 tasks across 4 tiers
- Tier 1: trivial reads (5 tasks)
- Tier 2: harder reads (6 tasks)
- Tier 3: writes (6 tasks)
- Tier 4: complex analysis (8 tasks)
Three arms
Arm 1: MCP
@modelcontextprotocol/server-github
allowed_tools = ["mcp__github__*"]
Arm 2: LobeHub skill
2,187 lines
Arm 3: Vault skill
The rules
- Same agent: Claude Agent SDK
- Same base prompt
- Same max turns (25)
- Same model (Claude Opus 4.6)
- Different tooling instructions
What's an evaluation?
Testing LLMs using other LLMs
Arize AI

Datasets and experiments

Five metrics
- correctness: LLM-as-judge vs. ground truth
- output_quality: LLM-as-judge on readability
- latency: wall-clock time
- cost: in dollars
- tool_fidelity: % of calls using tools given




All tasks
Tier 4 only




All tasks
Tier 4 only
Ouch




LLM explanations

Claude starts cheating
- T22: "Find issues opened and closed in the same milestone"
- Vault average: 7 calls, 54 seconds, $0.19
- MCP on its worst run: 71 calls, 485 seconds, $1.98
- Only 3 calls were actually to MCP!


All tasks
Tier 4 only
MCP is a curated menu
When MCP wins
- T13: "Create a branch, open a PR"
- MCP: 5 tool calls, 30 seconds, $0.12
- LobeHub average: 22 calls, 87 seconds, $0.49
The command line wins
...right?
gh is not a fair test of Skills
Claude is already great at this


Skills do help a little


Your CLI
is probably not this popular
MCP is more than tools
- Tools
- Resources
- Prompts
MCP can be an agent
Auth
MCP uses OAuth 2.1
Is MCP dead?
MCP vs. Command Line
is the wrong question
When to use the CLI
- Local workflows
- Tools the model already knows
- One user with one set of credentials
- Composition and drilling in
When to use MCP
- Remote services
- Proprietary tools
- Multi-tenant auth (OAuth)
- Stateful workflows
- Agent-backed capabilities
The headline findings
- MCP collapses on open-ended tasks (6x cost, 5x slower)
- Skills help a little — good ones beat bad ones
- Training data does most of the work for popular tools
Use the right tool for the job
Thank you
Code and data: github.com/seldo/acme-sdk-python
These slides: slides.com/seldo/mcp-vs-command-line
Follow me on Bluesky: 🦋 @seldo.com
MCP vs. the command line (AIE Miami)
By Laurie Voss
MCP vs. the command line (AIE Miami)
- 32