WE NEED GUARDIANS. BUT WHO GUARDS THE GUARDIANS?
from Kubernetes (κυβερνήτης, Greek for "governor", "helmsman" or "captain" or "steerer"; becomes gubernet- in Latin)
Earliest usages were more social than technical: governance of society
Control & Oversight mechanisms ensure that an agent or system remains subject to monitoring, constraint, and intervention - especially when alignment cannot be assumed or preserved through internal alignment mechanisms alone.
These mechanisms acknowledge that intelligences - whether human, organizational, professional, or machine - can drift, malfunction, or act in ways misaligned with shared values or collective wellbeing.
Oversight means a system is subject to observation and judgment by an exogenous intelligence. Control means that any power ceded to a system is never absolute; external checks remain possible even in complex autonomous systems.
Kill Switches
A kill switch is a mechanism that allows us to "shut it down." We get alignment confidence from the fact that misalignment will trigger termination.
Kill switches are everywhere! They are alignment mechanisms of last resort, expressing the collective right to interrupt when safety, legitimacy, or consent break down. They mark the boundary between autonomy and accountability.
Human Interaction
Time-outs, cooling-off periods, ceasefires – agreed pauses in conflict, therapy, or negotiation.
Breakups / divorce – terminating relationships that are no longer safe or functional.
Safe-word protocols that allow withdrawal of consent in interpersonal, labor, or experimental contexts.
Sleep / unconsciousness – “off” states that prevent escalation or damage when impaired.
Organizational
Suspension of operations – halting a project, trading, or policy rollout pending review.
Moratoriums and embargoes – freezes on potentially risky activities (e.g., gene editing, autonomous weapons).
State of emergency powers – temporary overrides that allow (or restrict) action.
Product recalls – removing faulty system or product from circulation.
Bankruptcy / liquidation – institutional death or reboot under controlled conditions.
Legal/Political
Injunctions and restraining orders – legal stop-signs on ongoing harm.
Impeachment or votes of no confidence – collective removal of authority.
Curfews or shutdown orders – societal-scale “off switches” in crises.
Quarantine and containment – temporary shutdowns to limit contagion.
Professional/Expert
Red tags / “stop work” authority in engineering or medicine — anyone can halt a dangerous process.
Fail-fast protocols – encouraging early termination of flawed designs.
Diagnostic freezes – stopping further treatment or computation until an anomaly is understood.
Machines
Circuit breakers (in finance, power grids, software systems).
Safe modes / degraded modes – systems that revert to a minimal, non-harmful state.
Human override or interlocks – technical equivalents of “arrest and detain.”
Dead man’s switches – reverse kill switches: if the operator fails, the system halts automatically.
governance structures
A xxx is a mechanism that
LOREM
Human Interaction
lorem ipsum
Organizational
Lorem ipsum
Legal/Political
lorem ipsum
Professional/Expert
Lorem ipsum
Machines
Lorem Ipsum
whistleblowing
A xxx is a mechanism that
LOREM
Human Interaction
lorem ipsum
Organizational
Lorem ipsum
Legal/Political
lorem ipsum
Professional/Expert
Lorem ipsum
Machines
Lorem Ipsum
Sandboxing
A sandbox is a mechanism that restricts a possibly unsafe system to an isolated environment where consequences of failure can be contained. It is a control and oversight mechanism because addresses the risk of misalignment from the outside.
LOREM
Human Interaction
lorem ipsum
Organizational
Lorem ipsum
Legal/Political
lorem ipsum
Professional/Expert
Lorem ipsum
Machines
Lorem Ipsum
Access controls
A xxx is a mechanism that
LOREM
Human Interaction
lorem ipsum
Organizational
Lorem ipsum
Legal/Political
lorem ipsum
Professional/Expert
Lorem ipsum
Machines
Lorem Ipsum
Confession feels voluntary — you decide to reveal. Transparency is structural — like open books or an algorithm you can audit. Being caught means someone else forces the issue; there’s no agency left.
Confession can restore trust because it shows awareness and remorse. Transparency just prevents the need for trust. Being caught usually destroys it.
Transparency is continuous and systemic; confession is episodic and moral. One’s about visibility, the other about conscience.
Confession’s about alignment from within; being caught is alignment from outside after failure.
What’s the difference between confession, transparency, and being caught?
"Readings"
Video: x [3m21s]
Activity: TBD
PRE-CLASS
CLASS
PRE-CLASS
Video: Linked Title [3m21s]
PRE-CLASS
PRE-CLASS
PRE-CLASS
CLASS
CLASS
Catalog examples you can find of brakes and kill switches in everyday life. "Safe word" in relationships or role-play, timeout in sports or conflict, mutual check-ins in group projects, parental override (“because I said so” in crisis), Emergency stop buttons on escalators, factory lines, or treadmills, “Stop work” authority on construction sites or labs, HR complaint channels / ombudsman, recall powers in governance, Hospital “code blue” teams – Stop all regular activity and redirect full institutional attention to a critical misalignment (a dying patient), licensure suspension / malpractice flags, Editorial kill switches in publishing, Judicial injunctions, Airplane autopilot disengage, “Are you sure?” prompts before deletion / data submission, Power button / battery pull, Rate limiting on APIs or social media posts.
How should whistleblowing and muckraking be regulated? Examples of when you have been sandboxed? (relationship to safe exploration)
Resources
CCSNWI Facilitating Interdisciplinary Meetings: A Practical Guide
Excerpts from Robert Ellickson – Order Without Law social norms and informal enforcement maintain alignment (whistleblowing, community sanctions)
FHI, etc. 2018 The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation Maybe a general reference? Some useful material in recommendations sections but could be too out there for this class. Chapter 04 Interventions ( high-level recommendations and priority areas for further research)
Camila Domonoske 2020 Uber Whistleblower Takes On Silicon Valley, Armed With Stoic Philosophy NPR piece about Susan Fowler book Whistle Blower
Jenna McLaughlin 2025 A whistleblower's disclosure details how DOGE may have taken sensitive labor data NPR 7 minute listen
Baker McKenzie 2023. The EU Whistleblower Directive: One Year On
Hunt and Ferrario 2022 A Review of How Whistleblowing is Studied in Software Engineering, and the Implications for Research and Practice 11pp
Facebook Files. Wikipedia. Wall Street Journal. NPR
The Watchdog That Didn't Bark | Dean Starkman - April 16, 2014 An hour-long video you can start at 8:02 or 10:28.
Hadfield-Menell, Dragan, Abbeel, Russell 2016 The Off-Switch Game