What are we talking about?


#1: The certificate outage

February 2014

We broke the registry a lot in January 2014

SSL in Node.js used to be a lot harder

Migrations are hard

Rolling back is hard,

even harder when everyone is yelling at you.

What did we learn?

  • We are bigger than we thought!
  • We can never change cert providers 🙁
  • Have a rollback plan

#2: The password change outage

April 2014


was not a good idea

Load at 100%

no matter how big a box we buy.


Password change errors

Fucked by silent string conversions

Infinite loops will fuck you

What did we learn?

  • Couch is a terrible fucking database
  • Don't roll your own... anything
  • Don't ignore "small" errors

#3: left-pad

March 2016

Kik me

Then we fucked up

by not having a clear policy

404: left-pad not found

I will feel genuinely bad about left-pad forever

What did we learn?

  • Unpublishes are really dangerous
  • Unpublishes are impossible after 24 hours

What did we learn?

  • We're even bigger now
  • Ignore vague legal threats
  • Hire a damn lawyer
  • Have clear policies

#4: Cloudflare migration

May 2018

We broke somebody else's registry

Infinite loop AGAIN, motherfucker


Who deploys on a Friday???

What did we learn?

  • Don't deploy on a Friday
  • Don't have hard deadlines based on early estimates
  • We are so big we are responsible for stuff we're not even responsible for

We have fucked up so many more times

I need a longer talk to cover them all


  • 4,346