MCP vs. the command line

a head-to-head evaluation of agent tool integration patterns

2026-04-22 AI Engineer Miami

What are we talking about?

What is MCP? What's a skill?
Why are we talking about this?
An experiment: MCP vs. skills
What the data says
What this means for you

What is MCP?

97 million monthly SDK downloads
17,000+ active MCP servers

What is a Skill?

The debate

Simon Willison, October 2025

Pieter Levels, March 2026

Vibes vs. data

Audience vote

Why GitHub?

The test bed

25 tasks across 4 tiers

Tier 1: trivial reads (5 tasks)
Tier 2: harder reads (6 tasks)
Tier 3: writes (6 tasks)
Tier 4: complex analysis (8 tasks)

Three arms

Arm 1: MCP

@modelcontextprotocol/server-github
allowed_tools = ["mcp__github__*"]

Arm 2: LobeHub skill

2,187 lines

lobehub.com/skills

Arm 3: Vault skill

340 lines

github.com/georgekhananaev/claude-skills-vault

The rules

Same agent: Claude Agent SDK
Same base prompt
Same max turns (25)
Same model (Claude Opus 4.6)
Different tooling instructions

What's an evaluation?

Testing LLMs using other LLMs

Arize AI

Datasets and experiments

Five metrics

correctness: LLM-as-judge vs. ground truth
output_quality: LLM-as-judge on readability
latency: wall-clock time
cost: in dollars
tool_fidelity: % of calls using tools given

All tasks

Tier 4 only

All tasks

Tier 4 only

Ouch

LLM explanations

Claude starts cheating

T22: "Find issues opened and closed in the same milestone"
Vault average: 7 calls, 54 seconds, $0.19
MCP on its worst run: 71 calls, 485 seconds, $1.98
Only 3 calls were actually to MCP!

All tasks

Tier 4 only

MCP is a curated menu

When MCP wins

T13: "Create a branch, open a PR"
MCP: 5 tool calls, 30 seconds, $0.12
LobeHub average: 22 calls, 87 seconds, $0.49

The command line wins

...right?

gh is not a fair test of Skills

Claude is already great at this

Skills do help a little

Your CLI

is probably not this popular

MCP is more than tools

Tools
Resources
Prompts

MCP can be an agent

Auth

MCP uses OAuth 2.1

Is MCP dead?

MCP vs. Command Line

is the wrong question

When to use the CLI

Local workflows
Tools the model already knows
One user with one set of credentials
Composition and drilling in

When to use MCP

Remote services
Proprietary tools
Multi-tenant auth (OAuth)
Stateful workflows
Agent-backed capabilities

The headline findings

MCP collapses on open-ended tasks (6x cost, 5x slower)
Skills help a little — good ones beat bad ones
Training data does most of the work for popular tools

Use the right tool for the job

Thank you

Code and data: github.com/seldo/acme-sdk-python

These slides: slides.com/seldo/mcp-vs-command-line

Follow me on Bluesky: 🦋 @seldo.com

MCP vs. the command line (AIE Miami)

By Laurie Voss

MCP vs. the command line (AIE Miami)

Laurie Voss PRO

seldo.com

MCP vs. the command line

What are we talking about?

What is MCP?

What is a Skill?

The debate

Simon Willison, October 2025

Pieter Levels, March 2026

Vibes vs. data

Audience vote

Why GitHub?

The test bed

25 tasks across 4 tiers

Three arms

Arm 1: MCP

Arm 2: LobeHub skill

Arm 3: Vault skill

The rules

What's an evaluation?

Arize AI

Datasets and experiments

Five metrics

Ouch

LLM explanations

Claude starts cheating

MCP is a curated menu

When MCP wins

The command line wins

...right?

gh is not a fair test of Skills

Claude is already great at this

Skills do help a little

Your CLI

is probably not this popular

MCP is more than tools

MCP can be an agent

Auth

MCP uses OAuth 2.1

Is MCP dead?

MCP vs. Command Line

is the wrong question

When to use the CLI

When to use MCP

The headline findings

Use the right tool for the job

Thank you

MCP vs. the command line (AIE Miami)

MCP vs. the command line (AIE Miami)

Laurie Voss PRO

More from Laurie Voss