Saving Time

Susan Sons

ISO Emeritus, NTPSec

Senior Systems Analyst, IU CACR

@HedgeMage  //  http://security.engineering

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”

― Margaret Mead

What is NTP?

NTP is...

  • Network Time Protocol: the primary way most computers throughout the world find out what time it is, and maintain synchronization with one another and the actual passage of time.
     
  • The reference implementation, in software, of that protocol: both the server and client side, plus the algorithms that use that information to regulate system clocks.

In February 2015, NTP was also a gigantic mess.

  • Not yet C99 compliant.

  • Fragile build system.

  • Documentation between six and thirty years out of date.

  • Code locked up in a proprietary SCM system.

  • Technical debt dating back decades.

The Security Nightmare

NTP was Critical

  • Finance
  • Cryptography & Authentication
  • Logging: Systems Administration & Security
  • Navigation & Location
  • Networking
  • SCIENCE!

NTP was Insecure

  • Vulnerability patches going public on a months-to-years response cycle.

  • Patches circulated in private weaponized and used to exploit servers across the internet.

  • Lack of access to development history made it difficult to audit the code and/or take on improvements as a drive-by contributor.

  • The overall state of the software, build infrastructure, and community made NTP brittle, full of vulnerabilities, and difficult to improve.

“given enough eyeballs, all bugs are shallow”

--Linus Torvalds

I learned how deep the rabbit hole went...

No OSS gets broken to the point of crisis without a driving set of systemic social problems.  If these are not addressed, any repair will be short-term, as the underlying cause of the original technical problems will continue to cause new technical problems.

To his credit...

NTP's maintainer asked for help.

In NTP's Case:

  • Poor resource allocation
     
  • Hostility to new contributors
     
  • Clinging to broken process and
    tooling as a mechanism of control.

The Rescue

Bringing Order to Chaos

Step Zero:

Decide you are going to be responsible.

Any Critical Software Rescue:

  • Set a clear, concrete, finite scope.
     
  • Expect drama.  Forgive drama.
     
  • Spend time with people -- split technical and social leadership positions if needed to make this investment possible.
     
  • Keep perspective: the purpose of a rescue is long-term sustainability.  Any other goal may be sacrificed to support this one.

How do you set a scope when you know there are unseen bugs lurking everywhere, and you are not deeply familiar with the code base?

The code's needs are the clearest part of the project scope.

  • Fixing bugs is temporary. More bugs are coming.

  • Long-term impact comes from making bugs easier to fix, and eliminating or preventing classes of bugs.

  • A good rescue results in a long tail of bug fixing.

High-Return Technical Improvements:

  • Code Access
  • Build Process
  • Testing Infrastructure and Automation
  • Documentation
  • Refactors that accomplish:
    • Major code reduction
    • Major improvements in internal compartmentation
    • Major tightening of internal APIs
    • Migration away from dangerous dependencies
  • ​Bugs that are immediate security crises.

What this meant for the NTP rescue's technical goals:

  • Migrate from Bitkeeper to git
  • Replace brittle build system with a modern, WAF-based build.
  • Update documentation enough to start onboarding new developers.
  • Fix as many security problems as possible before our time and money ran out.

Code Longevity:

  • Repository & Access
  • Build System
  • Tests
  • Documentation
  • Communication Channels
  • Personnel

People, Drama, and Project Sustainability

...by which I mean that I know everybody, including people who are better coders than I, and better wielders-of-bureaucracy, and people who know people...

I was lucky.

We needed programmers

  • Ready IMMEDIATELY

  • Familiar with ancient C code

  • Experienced in Linux/UNIX systems programming

  • Capable of working on highly critical code

  • With some idea how time works

  • Who care about open source and security

  • Who can spend a lot of time on this.

We also needed:

  • A way to keep those programmers fed

  • Help with documentation and toolchain work

  • Means to demonstrate to the existing NTP community that we weren't abandoning them

  • An understanding of the existing install base that we didn't have

  • The means to maintain the code, documentation, and community post-rescue

  • Some way to convince people to actually deploy the thing

Harlan Stenn - NTP Classic Maintainer

Adam Nuwer - Volunteer Sysadmin, Community Member

Von Welch - My Boss, CACR Director, CTSC PI

Anita Nikolich - NSF PM for CTSC

Members of the NTP Classic Community

Tim Minick - then of Gemini Observatory

Eric Raymond - (yes, that ESR) GPSd maintainer, Software Architect

Gary Miller -- GPSd Software Architect

Amar Takhar - former NTP Classic team member, build system geek

Leslee Cooper - CACR Admin Director, got me an awesome student intern (NaLette Brodnax) for docs work!

Many, many people who answered nosy questions about their NTP usage.

Mark Atwood -- Took the handoff as NTPSec Project Manager

Daniel Franke -- Took the handoff in as NTPSec ISO

Many other people I've failed to name.

NTP Classic

Two administrative staff.

One fundraiser.

One developer.

 

2-4 semi-active community members.

Rescue Team

Susan Sons,  PM / ISO

Eric Raymond,  lead dev

Gary Miller,  developer

NaLette Brodnax,  docs

Amar Takhar.  tools dev

 

...and a handful of concerned community members.

Much to my personal disappointment...

...I didn't find myself writing code on this one.

It turned out that I had some great (read: better than me) systems programmers to hand, including a more experienced software architect.

I was able to help out with some specific information security concerns, in my role as Information Security Officer, and play Security Architect as needed, but my biggest impact was undoubtedly making the project run...

What it takes to manage a

critical software rescue:

  • Deep understanding of the problem domain, of software engineering process in general, and of people.
    The worst mistake one can make is to misidentify the problem.
  • Relationships: find the right people at the right time.
  • A little resilience: always be calm, ready to adapt, and between your team and as much of the chaos as possible.
  • EITHER coding and software architecture expertise OR a close, long-standing working relationship with a coder and software architect who will be key to the rescue.

I can't teach you my whole process in this talk, but...

  • When Sputnik crashes down on your head, resist the urge to react immediately, unless it's to prevent immediate loss of life.  Gather information, start identifying the problem and scoping a response, and talk to people.
  • Do not try to make a smooth-running project with no margin for error.  Planning for drama and messiness and being able to absorb it is a winning strategy.
  • Write.  Write down your background planning, your thinking, your project scope.  Then, communicate people face-to-face (or by teleconference) and follow up in writing.
  • Be kinder to everyone than you need to be, be empathetic even when people are being wrong.  Not because you're a sap, because it's how you get people to do things you want.

So, how did the story end?

As of October 2016...

  • NTPSec has a team of two senior developers, one experienced project manager, one junior developer, one information security officer, and one toolchain maintainer-slash-sysadmin, aided by about a dozen interested and engaged community members.
     
  • Due to a reduction in code of over 2/3 (from 227kLOC to 74kLOC), NTPSec was immune to over 50% of NTP Classic vulns BEFORE discovery in the last year.
     
  • NTPSec patches security vulnerabilities, on average, within less than 12 hours after discovery.  Note that publication is sometimes slowed to coordinate with NTP Classic releases.
     
  • NTPSec's vulnerability response has pressured NTP Classic to speed up their response from months-to-years to days-to-weeks upon threats of funders pulling out.

I moved on...

NTPSec's core team has been through a lot, but we still meet up about once a year and hang out, because it was a wild ride with good people.  I was given an emeritus title when I stepped down last spring, in the hope that I'd remain "part of the family".

There is still so much vulnerable infrastructure software...

Pony Factor

How many currently active committers account for >50% of the code base?

Based on research by Daniel Gruno of Snoot.io

Why does it matter?

  • NTP
  • OpenSSL (think Heartbleed)
  • Bash (think Shellshock)
  • Costs of personnel turnover
  • Costs of neglect
  • Risk of malicious compromise

Active committers in widely used OSS projects:

Image credit: Dave Nalley

The sky is falling...

...but it's going to be okay.

This is what I do.

What I want from you is a little bit of help.

Do something about crumbling, insecure internet infrastructure

Questions/comments/etc welcome:

sesons@iu.edu

sons@security.engineering

Many Thanks!

To Wikimedia Foundation for their awesome library of freely reusable media, which spared you from my toddler-like drawing ability.

To Indiana University's Center for Applied Cybersecurity Research, and specifically the NSF-funded Center for Trustworthy Scientific Cyberinfrastructure, who funded the NTP Rescue project.  Also to the Internet Civil Engineering Institute, who aided with organization and developer resources.

To O'Reilly, for bringing me here to tell you this story.

To the NTP Security Project team, who made sure the rescue effort didn't go to waste.  NTPSec is poised to replace NTP classic in the coming year in installations around the world.

To the countless individual humans along the way who did NOT say

"this is somebody else's problem".

Using and Sharing This Work:

Creative Commons License  Saving Time by Susan Sons is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Permissions beyond the scope of this license may be available; send inquiries to hedgemage@binaryredneck.net.

Saving Time

By Susan Sons

Saving Time

How a few committed people helped hold up the internet...again.

  • 1,411
Loading comments...

More from Susan Sons