All those times
we broke
MelbJS, 2019-06-19
Read these slides on your device:
What are we talking about?
Warning:
extreme levels of profanity ahead
I stole this talk idea
from
#1: The certificate outage
February 2014
We broke the registry a lot in January 2014
SSL in Node.js used to be a lot harder
Migrations are hard
This is going to be a theme
DOUBLE FUCK
Rolling back is hard,
even harder when everyone is yelling at you.
What did we learn?
- We are bigger than we thought!
- We can never change cert providers 🙁
- Have a rollback plan
#2: The password change outage
April 2014
Our CouchDB
app was bad
I'm told there are better ones.
Load at 100%
no matter how big a box we buy.
WTF?
Password change errors
Who has time to fix them when the server is always on fire?
Fucked by silent string conversions
Infinite loops will fuck you
What did we learn?
- We are not good at CouchDB
- Don't roll your own... anything
- Don't ignore "small" errors
#3: Registry 2.0 launch
April 2015
Farewell, Couch app
What did we learn?
Use canaries to test big changes
before they go out to everybody.
Canaries use real data
And real data is extremely messy
#4: left-pad
March 2016
Kik me
Then we fucked up
by not having a clear policy
404: left-pad not found
I will feel genuinely bad about left-pad forever
What did we learn?
- Unpublishes are really dangerous
- Unpublishes are impossible after 24 hours
What did we learn?
- We're even bigger now
- Ignore vague legal threats
- Hire a damn lawyer
- Have clear policies
#5: "fs" unpublished
August 2016
fs on npm
Does literally nothing.
fs in node
Does everything with the file system.
How dangerous can it be to unpublish something that doesn't do anything?
Oh, you sweet summer child.
So we put it back
It still gets downloaded 400,000 times per week.
What did we learn?
- Don't unpublish things. FFS.
- Internal process is important
#6: VS Code takes down the registry
November 2016
VS Code was just trying to be helpful
404 is an error
Do you cache error responses?
We didn't cache 404s
And neither did VS Code.
What did we learn?
- Cache error states
- VS Code is very popular
- TypeScript is very popular
- Microsoft is way nicer than it was in the 90s
#7: Nuked the payments database
December 2017
Scoped packages:
@user/name
Scoped packages can be private
Customer support are powerful people
DELETE FROM Customers;
See, this is why I hate ORMs.
What did we learn?
- ORMs are dangerous
- Always test your backups
#8: require-from-string
January 2018
Spam
It's why we can't have nice things
npm package pages have really great pagerank
You have to delete spam
Smyte: remember that name.
Sometimes real things look like spam
What did we learn?
- STOP UNPUBLISHING THINGS FFS
- Be careful giving robots too much power
- Spammers are persistent fuckers
#9: Cloudflare migration
May 2018
We broke somebody else's registry
Infinite loop AGAIN, motherfucker
DOUBLE FUCK
Who deploys on a Friday???
What did we learn?
- Don't deploy on a Friday
- Don't have hard deadlines based on early estimates
- We are so big we are responsible for stuff we're not even responsible for
#10: Smyte smites us
June 2018
Smyte turned off their API with 30 minutes of notice
Tweeting angrily will definitely help
What did we learn?
- Plan for absurd failures
- Beware of cheap APIs
- Never tweet
We will fuck up again
We are particularly good at finding
new ways to accidentally delete things
@seldo
These slides are available right now
Now would be a good time to follow me on Twitter
I ❤️ you
All those times we broke the registry
By seldo
All those times we broke the registry
- 4,050