MCP vs. the command line

a head-to-head evaluation of agent tool integration patterns

2026-04-22 AI Engineer Miami

What are we talking about?

  • What is MCP? What's a skill?
  • Why are we talking about this?
  • An experiment: MCP vs. skills
  • What the data says
  • What this means for you

What is MCP?

  • 97 million monthly SDK downloads
  • 17,000+ active MCP servers

What is a Skill?

The debate

Simon Willison, October 2025

Pieter Levels, March 2026

Vibes vs. data

Audience vote

Why GitHub?

The test bed

25 tasks across 4 tiers

  • Tier 1: trivial reads (5 tasks)
  • Tier 2: harder reads (6 tasks)
  • Tier 3: writes (6 tasks)
  • Tier 4: complex analysis (8 tasks)

Three arms

Arm 1: MCP

@modelcontextprotocol/server-github
allowed_tools = ["mcp__github__*"]

Arm 2: LobeHub skill

2,187 lines

lobehub.com/skills

Arm 3: Vault skill

The rules

  • Same agent: Claude Agent SDK
  • Same base prompt
  • Same max turns (25)
  • Same model (Claude Opus 4.6)
  • Different tooling instructions

What's an evaluation?

Testing LLMs using other LLMs

Arize AI

Datasets and experiments

Five metrics

  • correctness: LLM-as-judge vs. ground truth
  • output_quality: LLM-as-judge on readability
  • latency: wall-clock time
  • cost: in dollars
  • tool_fidelity: % of calls using tools given

All tasks

Tier 4 only

All tasks

Tier 4 only

Ouch

LLM explanations

Claude starts cheating

  • T22: "Find issues opened and closed in the same milestone"
  • Vault average: 7 calls, 54 seconds, $0.19
  • MCP on its worst run: 71 calls, 485 seconds, $1.98
  • Only 3 calls were actually to MCP!

All tasks

Tier 4 only

MCP is a curated menu

When MCP wins

  • T13: "Create a branch, open a PR"
  • MCP: 5 tool calls, 30 seconds, $0.12
  • LobeHub average: 22 calls, 87 seconds, $0.49

The command line wins

...right?

gh is not a fair test of Skills

Claude is already great at this

Skills do help a little

Your CLI

is probably not this popular

MCP is more than tools

  • Tools
  • Resources
  • Prompts

MCP can be an agent

Auth

MCP uses OAuth 2.1

Is MCP dead?

MCP vs. Command Line

is the wrong question

When to use the CLI

  • Local workflows
  • Tools the model already knows
  • One user with one set of credentials
  • Composition and drilling in

When to use MCP

  • Remote services
  • Proprietary tools
  • Multi-tenant auth (OAuth)
  • Stateful workflows
  • Agent-backed capabilities

The headline findings

  • MCP collapses on open-ended tasks (6x cost, 5x slower)
  • Skills help a little — good ones beat bad ones
  • Training data does most of the work for popular tools

Use the right tool for the job

Thank you

Code and data: github.com/seldo/acme-sdk-python

 

These slides: slides.com/seldo/mcp-vs-command-line

 

Follow me on Bluesky: 🦋 @seldo.com

MCP vs. the command line (AIE Miami)

By Laurie Voss

MCP vs. the command line (AIE Miami)

  • 32