The Death of the Code Review

What are we talking about today?

The bottleneck by the numbers

  • +741% lines of code written
  • +30% software shipped

Coding 8x faster to ship 1.3x faster

is a gigantic waste of effort

The cost of producing

plausible code has collapsed.

 

The cost of knowing

whether to trust it has not.

Generation

is no longer the bottleneck

You can't just review harder

It takes three to four days

to review a 10k line pull request

Some people just

stopped reading the code

"Humans may review pull requests, but aren't required to."

— OpenAI

Automate the reviews entirely

Passes tests is not

the same as mergeable

Tests don't capture everything

A benchmark for

"would you merge this?"

Mergeable is a much harder test

  • Fable 5 on SWE-Bench Pro: 80.3%
  • Fable 5 on FrontierCode: 29.3%

If you figure out mergeability, the models will train on it

Anything cheap to verify

gets beaten

gets trained on

Whoever writes

the definition of good

writes next year's models

Who's actually doing

automatic reviews at scale?

The primary task is

managing false positives

Multi-pass gets results

They had to tell it to

trust the code less

Reviewers don't just review,

they patch

Everyone's metric is

"did the human accept it?"

Can you get away without

any human review?

There was still a human

on the loop

A million lines of rust.

Nobody read the diff.

Thirteen thousand assertions nobody checked

  • 13,044 unsafe blocks in the port
  • 73 in a comparable hand-written project

What skipping review

actually costs

OpenAI spent

every Friday on slop reduction

No, we can't get away

without humans

"Please, please read the code."

Tests verify

what someone encoded

The human moves up a level

Someone still has to

review the reviewer

  • "this action is not hardened against prompt injection attacks and should only be used to review trusted PRs."
  • — Anthropic

Automated reviewers

can be misled

88% of the time

The last reviewer is production

Evals are your last line of defense

You knew I would mention evals eventually

We can go faster,

at a higher level of abstraction

The teams that win will be the ones who can trust what they shipped

What to do now

Thanks!

I'm @seldo.com on BlueSky 🦋

Come to our world cup watch party today at 5pm!