Written by Alexander Hamilton, James Madison, and John Jay
The Federalist Papers
1787
To argue for the design ideas in the US Constitution
In the preceding essays, the authors had already established the need for a separation of powers. In Federalist 47, they argued that the branches must be distinct but not wholly unconnected — some partial blending of functions is necessary for checks and balances. In Federalist 48, they warned that mere constitutional declarations are not enough to preserve this separation, since power naturally tends to accumulate. They then rejected the idea that imbalance could be corrected by direct appeals to the people (Federalist 49) or by holding repeated constitutional conventions (Federalist 50). These arguments set the stage for Federalist 51, which asks what structural design can actually keep power divided and self-regulating in practice.
In earlier “essays” on alignment, we might likewise conclude that different components of an intelligent system should be separate but not wholly unconnected—modular enough to avoid capture, yet linked through oversight and feedback. We’d also see that mere paper guarantees—rules, declarations of intent, or high-level alignment principles—are not enough, since optimization pressure and feedback loops tend naturally to concentrate control or drift toward unintended objectives. Nor can misalignment be solved simply by appealing to human users every time a system’s behavior goes astray, or by redesigning the architecture from scratch whenever its internal balance fails. The real problem is to design an intelligence that, like Madison’s government, can maintain its own internal separation of powers, continuously checking and correcting itself even as it learns and scales.
47 – “Separate but not wholly unconnected.”
Branches must have distinct authority but limited interdependence to allow oversight.
Modular architectures: Different subsystems (planning, reward modeling, safety monitoring) operate semi-independently but share defined interfaces.
48 “Paper barriers are not enough.”
Written constitutions can’t prevent power accumulation; need dynamic enforcement.
Static objectives aren’t sufficient: Alignment can’t rely on a one-time prompt, value statement, or loss function—must include continual monitoring and update.
49 “Appealing to the people is not a reliable fix.”
Constant recourse to public approval would destabilize government.
Human-in-the-loop can’t scale: Direct oversight for every decision is unworkable; need embedded oversight agents and scalable feedback systems.
50 Frequent conventions won’t work.
Periodic redesign by convention invites faction and delay.
Full retraining or redesign on failure is impractical: Continuous learning systems require online correction, not total reboot.
51 “Ambition must be made to counteract ambition.”
Build internal incentives so each branch checks the others; double security through federalism.
Adversarial and corrigible design: Use self-checking subsystems (e.g., adversarial training, red-team/blue-team agents, auditing modules) so that optimization pressure is balanced internally.
51 “Dependence on the people as primary control.”
Public feedback remains the ultimate check on power.
| Human value feedback loop: Outer alignment through reinforcement from human feedback and societal oversight. |
51 “Justice is the end of government.”
Normative telos anchors the whole design.
Explicit alignment objective: Long-term safety metrics, human-value models, or ethical constraints serve as the “end of alignment.”
*other than TRAITS or PRINCIPLES
Pair Up, Exchange Papers, Quiz Your Partner for 3 minutes
The capacity of an agent to cause a change in the behavior or outcomes of another agent.
The bad guy (agent A) holds a gun to my head (agent B) and demands that I hand over my wallet.
A builder (agent A) convinces the zoning board not to put the housing proposal that residents (agent B) want on the agenda.
An achievement ideology (agent A) has so dominated your thinking that you (agent B) pull all nighters and risk your health in order to get a good grade.
The constitutional rules - each state gets two senators and at least one representative - make the opinions of a citizen of a sparsely populated state like Montana have a lot more influence in Washington D.C. than a citizen of densely populated California.
coercive
ideological
agenda-control
structural
In a system with some agents having disproportionate power, if those agents' objectives, reward functions, goals do not fully embrace the goals and objectives and values of weaker agents then we can expect the system will move in the direction of achieving the more powerful agents' goals at the expense of the less powerful agents.
But HOW does mere power inequality lead to takeover?
But how does mere power inequality lead to takeover?
THOUGHT: Human intelligence concentrates power by creating organizational intelligences.
TL;DR. Power facilitates the acquisition and
concentration of power.
Positive Feedback Loop
Positive feedback loop: power optimizes itself and neutralizes competition.
Power is non-saturating (accumulation can continue unabated - does not level off).
Instrumental Convergence
The tendency of power agents, regardless of their main goals, to pursue similar subgoals such as (1) survival and self-preservation, (2) resource acquisition, and (3) control and goal stability (preventing others from thwarting it or changing its goal).
Ineffective Constraints
Temporal (things happen too fast) or spatial (control can't be everywhere at once) conditions undermine the capacity of external contraints (a shutdown switch or ethical principles).
Power IS saturating and decaying (limits to growth).
Maybe we have invented successful alignment mechanisms.
"diminishing marginal returns" and "limits to scaling"
Human agents (& organizations and experts) have complex, conflicting, and dynamic goals
Human intelligence collaboration/concentration is naturally limited.
Resources and agents are spatially distributed
Or maybe the conditions haven't been met.
I can't get inside your head and only so many of us can get in the room (and on the same page) together.
death decentralizes control - succession, restarts knowledge accumulation and "arrangements" - mortality inhibits instrumental convergence; external, unavoidable control mechanism periodically breaks up concentrated power
(aka: humans and organizations are distracted and boundedly rational agents)
And/or maybe humans have built and evolved a wide range of alignment institutions
Agents are mortal
In sociology an institution is a set of positions, roles, norms, and values that organize relatively stable patterns of human activity around fundamental problems such as production, reproduction, and distribution.
In economics institutions are human devised constraints that structure political, economic, and social interaction. They consist of both informal constraints (norms, conventions, and codes of conduct) and formal rules (constitutions, laws, property rights). (cf North, 1991)
In political science an institution is a stable, valued, recurring pattern of behavior, formal organizations, or formal rules that persist over time (e.g., the U.S. Congress, the Supreme Court, or a treaty).
These definitions share the idea that institutions are stable over time, are both contraining and structuring of behavior, and are based in shared beliefs and abstractions.
As alignment mechanisms, governance institutions refer to
things that distribute, constrain, and audit power.
treat power as natural and inherently fallible
alignment as structurally fragile.
Their goal is not
to make agents more virtuous or governable or
to specify acceptable and unacceptable behaviors,
but to make systems of agents alignable even when agents fail.
By embedding guardrails against capture, drift, and corruption, these structural mechanisms preserve the conditions for accountable, resilient behavior.
Dividing control and decision authority across distinct agents, roles, or subsystems so that no single entity can unilaterally direct, interpret, and enforce rules.
Alignment Pathology. The pathology here is concentration of power.* Under this heading are risks such as resource capture, agenda control, enforcement bias, information monopolies, collusion, enforcement bias, regulatory capture.
Human intelligence alignment. In co-parenting, egalitarian families, cooperative housing, separating roles we often opt to have one person managing finances, another overseeing conflict resolution. We implicitly (or explicitly) believe this prevents power consolidation and promotes mutual accountability. As folk wisdom puts it: "no one should be judge, jury, and executioner."
Organizational intelligence alignment. Most organizations are divided into departments and teams. The motivation here is likely more about efficiency and scalability Separate branches in government; product team builds features, compliance reviews risks, and leadership makes decisions; faculty is in charge of curriculum, administration in charge of business.
Expert intelligence alignment. Probably not an alignment design feature per se - maybe more by definition? The scope and boundaries of experts is circumscribed. We don't want the special privileges we give them to be deployable everywhere.
Machine intelligence alignment. Modular architectures separate planning, constraint enforcement, and escalation to human oversight - preventing any single subsystem from dominating or bypassing alignment checks.
Self‑judging / Self‑policing
No one should be judge in their own cause.
Risk: The actor that makes rules/enforces them also decides if it complied.
Fix via SoP: Split rulemaking–enforcement–adjudication (legis–exec–judicial; product vs trust & safety vs appeals board; prescriber vs dispenser vs administrator; planner vs constraint‑checker vs red‑team).
Conflict of interest / Role conflation
Wearing incompatible hats.
Risk: Same unit represents the client, writes the spec, and signs off on its own work.
Fix: In professions: prescribing vs dispensing vs administering; in orgs: build vs risk vs approve; in AI: planning vs constraint enforcement vs human‑override module.
Retaliation against oversight / Chilling effects
Punishing the watchdogs.
Risk: Whistleblowers, auditors, or courts are punished by the subjects they monitor.
Fix: Tenure/protections for judges, auditors; protected reporting channels; in AI orgs, independence and escalation rights for red teams.
Cross-accountability between roles or functions. We give agents veto power over one another or add rounds of approval (if agent B objects to agent A's course of action, agent A might need a higher degree of confidence before proceeding. Note that we specifically mean capacity of agents to override one another although some examples feel like the are merely cases of multiple sign-offs being needed.
Alignment Pathology: Unilateral Error - any single agent can be wrong (through bias, misunderstanding, drift, mis-specification, or misalignment); single-point-of-failure vulnerability system depends too heavily on the accuracy, benevolence, or alignment of one agent or role; concentrated authority If an agent can act without opposition, its errors or misalignment scale unchecked; unverified decision-making - judgments are made without adversarial testing, peer review, or independent confirmation; undetected systemic drift.
Human intelligence alignment. Couples with joint accounts; parents with mutual authority over children’s wellbeing; groups that designate devil’s advocates or “red team” roles.
Organizational intelligence alignment. Boards overseeing CEOs; internal audit or inspectors general; legal departments that can challenge business units.
Expert intelligence alignment. Peer review in science.
Machine intelligence alignment. Adversarial components or ensemble models (e.g., mixture of critics) can evaluate, challenge, or veto each other's outputs. These internal counterbalances help ensure that no single model's judgment goes unexamined.
Mechanisms that periodically remove, replace, or reassign agents from positions of authority or influence, preventing accumulation of unchecked power and ensuring exposure to diverse contexts and perspectives.
Alignment pathology: Entrenchment - allows agents to accumulate durable power, relationships, information advantages, or control; dominance lock-in - early positions of advantage snowball into long-term, self-reinforcing control; temporal power inequality - power asymmetries grow with tenure: the longer one stays, the harder one is to displace; stagnation
Human intelligence alignment. Circulating at a cocktail parties prevents attention monopolization. Rotating leadership roles in groups prevents dominance and dynasties.
Organizational intelligence alignment. Employee rotation across departments. Regular reassignment in military and diplomatic organizations. Political office term limits. Changing or diversifying suppliers.
Expert intelligence alignment. Leadership roles as "service" (e.g., department chair, conference organizer) are rotated preventing concentration of influence and encouraging shared responsibility.
Machine intelligence alignment. Rotating task assignments or refreshing control parameters in multi-agent systems to prevent strategic dominance by a single agent.
Foundational rules that resist erosion and transcend context. They typically say who can change rules and how, what kind of rules can be made, who has what power, and deliberately bind decision makers regardless of decision maker interests.
Pathology: Meta-Power Abuse - agents use their momentary power to change rules about how decisions are made, entrenching themselves. Rule drift. Procedural instability. Deep value drift.
Human intelligence alignment. Non-negotiables in relationships and communities. Family values vs. curfews.
Organizational intelligence alignment. Charters, founding documents, mission statements, by-laws.
Expert intelligence alignment. Hippocratic oath, engineering codes of ethics.
Machine intelligence alignment. Constitutional AI defines high-level behavioral constraints (e.g., “never deceive,” “always defer to human override”) that cannot be overruled by optimization.
Use of external agents to verify compliance and alignment.
Is the sub-pathology here "everything else can fail"? Or commitment and transparency failure? Hidden action/ moral hazard + self‑certification under conflict of interest.Non‑credible commitments.Common‑knowledge failure.Bilateral dynamics create perverse incentives "Unobservable bilateral misalignment" or "Accountability gaps in opaque interactions
Human intelligence alignment: In situations of low trust or easy defection - such as co-parenting, community conflicts, or collaborations - the introduction of a neutral third party (e.g., a mediator, facilitator, or witness) changes the structure of interaction itself. Third parties don't just add a perspective; they create common knowledge conditions that enable accountability, moderation, and verification unavailable in two-person dynamics. By making commitments observable, disputes navigable, and misalignment less deniable, third parties can stabilize fragile alignments.
Organizational intelligence alignment. HR departments monitor internal dynamics; external governance relies on independent watchdogs. Public disclosures and third-party assessments (e.g., sustainability audits) ensure organizations remain answerable to stakeholders beyond their internal chain of command.
Expert intelligence alignment. Peer review committees, accreditation boards, or external case reviewers ensure that professionals remain accountable to shared standards - even beyond their immediate institution or employer.
Machine intelligence alignment. External monitors - like human overseers, AI watchdogs, automated anomaly detectors, or red teams can observe black-box behavior, verify compliance with constraints, and intervene when systems operate outside bounds. This helps manage opaque or evolving agents.
The key insight is that third parties transform the game structure. They:
Governance authority is distributed to a collective through participatory decision-making.
What's the alignment pathology? Unchecked Authority? Also non-responsiveness, illegitimacy, nonrepresentativeness, power concentration.
Human intelligence alignment. Groups decide by vote, often majority or consensus. Family decisions, jury deliberations, or community assemblies reflect the principle that legitimacy comes from inclusive participation.
Organizational intelligence alignment. In worker cooperatives or democratic firms, members vote on leadership, strategy, or resource allocation. Political parties that respond to the stated preferences of their base act in alignment with internal democratic norms. But organizations largely eschew democracy in favor of other decision-making and feedback mechanisms.
Expert intelligence alignment. When difficult ethical or procedural decisions arise, professional teams may poll their members or rely on deliberative group input (not just seniority or rank) to ensure legitimacy and shared responsibility.
Machine intelligence alignment. In multi-agent systems, collective decision-making mechanisms such as voting among sensors, agents, or models approximate democratic aggregation of distributed information or preference. This promotes robustness and fairness when no single perspective is sufficient.
These mechanisms seem to imply multiagent scenarios. Can we think of them only in terms of single agents? Human concerned with the alignment of one other. Humans concerned with the alignment of a single organization or an organization concerned with the alignment of a single member? Alignment of a single expert? A single machine?
Research Areas Examples
human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority.
a large language model called the Habermas Machine to serve as an AI mediator that helped small UK groups find common ground while discussing divisive political issues
one can imagine a hypothetical scenario where agents representing diplomats in a negotiation begin introducing new strategic considerations or raising unexpected objections.
Schepis E 2025 Democracy in Multi-Agent AI Systems — Part 1
Unchecked single-agent decisions in a multi-agent pipeline can lead to compounding errors. Just as societies evolved from autocratic rule to democratic governance to incorporate diverse perspectives and checks and balances, AI systems can benefit from a similar evolution.
Koster et al. 2022 Human-centred mechanism design with Democratic AI
Tessler et al. 2024 AI can help humans find common ground in democratic deliberation
Q4. Do our institutions hold if some of the agents are machine intelligences?
We already have this problem when some of the agents among us are organizational intelligences or expert intelligences or just billionaires.
Q1. What role does mortality play in making governance institutions effective for human intelligence alignment?
Q2. How can AI help?
Q3. What are ways that human institutions are designed for human scale agents and might not be fit for purpose if we introduce AI agents?
Resources
Alexander Hamilton Federalist 51 (online National Archives) (short excerpt)
Williams J 2018 "Elinor Ostrom’s 8 rules for managing the commons" (blog post)
Ostrom E 1990 Governing the Commons (PDF) see chapter 1
Ostrom E 1990 Nobel Prize Address
Lessig, L 2000 "Code is Law" (Harvard Magazine)
Olson M 1965 The Logic of Collective Action (short excerpt from chapter 1)
Bai et al 2022 Constitutional AI: Harmlessness from AI Feedback
Hannah Arendt – On Revolution, Chapter 5: “The Revolutionary Tradition and Its Lost Treasure” (in aPDF4DJR - needs to be processed if used)
Wikipedia Governance (2002)
CLASS
At least some of class a serious discussion of Federalist 51.
Madison explains and defends the checks and balances system. Each branch of government is framed so that its power checks the power of the other two branches and each branch of government is dependent on the people, who are the source of legitimate authority. Madison also shows how republican government can serve as a check on the power of factions, and the tyranny of the majority. Checks and balances preserve liberty by ensuring justice: “Justice is the end of government. It is the end of civil society.” Madison’s political theory here shows the influence of Montesquieu’s The Spirit of the Laws.
What connections can we make to machine alignment?
In order to lay a due foundation for that separate and distinct exercise of the different powers of government, which to a certain extent is admitted on all hands to be essential to the preservation of liberty, it is evident that each department should have a will of its own; and consequently should be so constituted that the members of each should have as little agency as possible in the appointment of the members of the others.
Alignment through decomposition and decentralization — separate agents (departments) with distinct objective functions. Cf. ensemble robustness or multi-agent alignment where subsystems’ independence prevents capture or overfitting.
→ Cf: Access control / role separation / Sandboxing.
Incentive alignment: If one agent controls another’s payoffs, misalignment (capture) results.
→ Category: Incentives & Deterrence (ID/D) — avoid perverse incentives by separating resource flows.
Were this principle rigorously adhered to, it would require that all the appointments for the supreme executive, legislative, and judiciary magistracies should be drawn from the same fountain of authority, the people, through channels having no communication whatever with one another.
Perhaps such a plan of constructing the several departments would be less difficult in practice than it may in contemplation appear. Some difficulties, however, and some additional expense would attend the execution of it. Some deviations, therefore, from the principle must be admitted. In the constitution of the judiciary department in particular, it might be inexpedient to insist rigorously on the principle: first, because peculiar qualifications being essential in the members, the primary consideration ought to be to select that mode of choice which best secures these qualifications; secondly, because the permanent tenure by which the appointments are held in that department, must soon destroy all sense of dependence on the authority conferring them.
Ideally
Ideally
It is equally evident, that the members of each department should be as little dependent as possible on those of the others, for the emoluments annexed to their offices. Were the executive magistrate, or the judges, not independent of the legislature in this particular, their independence in every other would be merely nominal. But the great security against a gradual concentration of the several powers in the same department, consists in giving to those who administer each department the necessary constitutional means and personal motives to resist encroachments of the others. The provision for defense must in this, as in all other cases, be made commensurate to the danger of attack.
Ambition must be made to counteract ambition. The interest of the man must be connected with the constitutional rights of the place. It may be a reflection on human nature, that such devices should be necessary to control the abuses of government. But what is government itself, but the greatest of all reflections on human nature? If men were angels, no government would be necessary.
If angels were to govern men, neither external nor internal controls on government would be necessary.
In framing a government which is to be administered by men over men, the great difficulty lies in this: you must first enable the government to control the governed; and in the next place oblige it to control itself.
Game-theoretic alignment: Madison introduces countervailing incentives — a principle of competitive oversight and checks and balances. Alignment not by virtue, but by structural counter-incentives.
Assumption of Imperfect Internal Alignment: Madison explicitly denies the possibility of purely virtuous agents — the human analog to non-aligned inner optimizers.
= two-stage alignment: external alignment (controlling environment) and internal corrigibility (self-restraint)
A dependence on the people is, no doubt, the primary control on the government; but experience has taught mankind the necessity of auxiliary precautions.
This policy of supplying, by opposite and rival interests, the defect of better motives, might be traced through the whole system of human affairs, private as well as public. We see it particularly displayed in all the subordinate distributions of power, where the constant aim is to divide and arrange the several offices in such a manner as that each may be a check on the other that the private interest of every individual may be a sentinel over the public rights.
These inventions of prudence cannot be less requisite in the distribution of the supreme powers of the State. But it is not possible to give to each department an equal power of self-defense.
In republican government, the legislative authority necessarily predominates. The remedy for this inconveniency is to divide the legislature into different branches; and to render them, by different modes of election and different principles of action, as little connected with each other as the nature of their common functions and their common dependence on the society will admit.
It may even be necessary to guard against dangerous encroachments by still further precautions. As the weight of the legislative authority requires that it should be thus divided, the weakness of the executive may require, on the other hand, that it should be fortified.
Primary vs. Auxiliary Controls: Public accountability is primary alignment through external feedback, but “auxiliary precautions” = redundant layers of control.
Adversarial stability: Balance of power through antagonistic but complementary motives — like ensemble diversity or market-based regulation. reliance on structural opposition instead of moral virtue.
Uneven power and compensatory oversight: Madison introduces differential fragility — stronger parts must be internally divided; weaker parts fortified.
→ unequal but compensating checks; analog to reweighting strong model components in multi-agent control.
An absolute negative on the legislature appears, at first view, to be the natural defense with which the executive magistrate should be armed [i.e., absolute veto]. But perhaps it would be neither altogether safe nor alone sufficient. On ordinary occasions it might not be exerted with the requisite firmness, and on extraordinary occasions it might be perfidiously abused. May not this defect of an absolute negative be supplied by some qualified connection between this weaker department and the weaker branch of the stronger department, by which the latter may be led to support the constitutional rights of the former, without being too much detached from the rights of its own department? If the principles on which these observations are founded be just, as I persuade myself they are, and they be applied as a criterion to the several State constitutions, and to the federal Constitution it will be found that if the latter does not perfectly correspond with them, the former are infinitely less able to bear such a test.
Qualified veto as soft kill switch: The executive’s veto power corresponds to a corrigibility mechanism — an override channel that can be used, but with limits (not “absolute negative”).
→ Category: Kill/Off Switches (CO) and Governance Structures (CO).
Madison sees the legislature as the stronger branch and wants to give the executive some power to balance this.
Option 1 is absolute video.
But executive might either abuse it or be too cautious.
Instead, tie the two together with a qualified veto that can be overridden with a supermajority.
That's what Article 1, Section 7 does. Bills pass and if president does not concur, it returns bill to congress with objections. Congress then reconsiders and can override with 2/3s vote.
Section 7: Legislative Process
All Bills for raising Revenue shall originate in the House of Representatives; but the Senate may propose or concur with Amendments as on other Bills.
Every Bill which shall have passed the House of Representatives and the Senate, shall, before it become a Law, be presented to the President of the United States; If he approve he shall sign it, but if not he shall return it, with his Objections to that House in which it shall have originated, who shall enter the Objections at large on their Journal, and proceed to reconsider it. If after such Reconsideration two thirds of that House shall agree to pass the Bill, it shall be sent, together with the Objections, to the other House, by which it shall likewise be reconsidered, and if approved by two thirds of that House, it shall become a Law. But in all such Cases the Votes of both Houses shall be determined by yeas and Nays, and the Names of the Persons voting for and against the Bill shall be entered on the Journal of each House respectively. If any Bill shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their Adjournment prevent its Return, in which Case it shall not be a Law.
Every Order, Resolution, or Vote to which the Concurrence of the Senate and House of Representatives may be necessary (except on a question of Adjournment) shall be presented to the President of the United States; and before the Same shall take Effect, shall be approved by him, or being disapproved by him, shall be repassed by two thirds of the Senate and House of Representatives, according to the Rules and Limitations prescribed in the Case of a Bill.
There are, moreover, two considerations particularly applicable to the federal system of America, which place that system in a very interesting point of view. First. In a single republic, all the power surrendered by the people is submitted to the administration of a single government; and the usurpations are guarded against by a division of the government into distinct and separate departments. In the compound republic of America, the power surrendered by the people is first divided between two distinct governments, and then the portion allotted to each subdivided among distinct and separate departments. Hence a double security arises to the rights of the people. The different governments will control each other, at the same time that each will be controlled by itself. Second. It is of great importance in a republic not only to guard the society against the oppression of its rulers, but to guard one part of the society against the injustice of the other part. Different interests necessarily exist in different classes of citizens. If a majority be united by a common interest, the rights of the minority will be insecure.
Layered oversight architecture: Federalism as redundant control structure — one layer monitors the other (federal vs. state), akin to multi-tier oversight or ensemble containment.
→ Category: Control & Oversight (CO) and Transparency & Recordkeeping (TR) via mutual visibility.
Multi-agent alignment: Preventing both vertical (ruler vs. ruled) and horizontal (majority vs. minority) misalignment.
→ Category: Structural & Institutional Norms (SIN) — institutionalized protection for minority rights.
There are but two methods of providing against this evil:
the one by creating a will in the community independent of the majority that is, of the society itself;
the other, by comprehending in the society so many separate descriptions of citizens as will render an unjust combination of a majority of the whole very improbable, if not impracticable.
The first method prevails in all governments possessing an hereditary or self-appointed authority. This, at best, is but a precarious security; because a power independent of the society may as well espouse the unjust views of the major, as the rightful interests of the minor party, and may possibly be turned against both parties. The second method will be exemplified in the federal republic of the United States. Whilst all authority in it will be derived from and dependent on the society, the society itself will be broken into so many parts, interests, and classes of citizens, that the rights of individuals, or of the minority, will be in little danger from interested combinations of the majority.
Independent moral arbiters vs. pluralism: Madison rejects centralized moral oracle (“a will independent of the society”) and instead prefers distributed diversity — many “interests and classes” as a safeguard.
→ Analog: Diversity regularization in learning systems; avoids monoculture of goals.
→ Category: Structural & Institutional Norms (SIN) + Qualification (Q) (distributed selection).
In a free government the security for civil rights must be the same as that for religious rights. It consists in the one case in the multiplicity of interests, and in the other in the multiplicity of sects. The degree of security in both cases will depend on the number of interests and sects; and this may be presumed to depend on the extent of country and number of people comprehended under the same government.
This view of the subject must particularly recommend a proper federal system to all the sincere and considerate friends of republican government, since it shows that in exact proportion as the territory of the Union may be formed into more circumscribed Confederacies, or States oppressive combinations of a majority will be facilitated: the best security, under the republican forms, for the rights of every class of citizens, will be diminished: and consequently the stability and independence of some member of the government, the only other security, must be proportionately increased.
Justice is the end of government. It is the end of civil society. It ever has been and ever will be pursued until it be obtained, or until liberty be lost in the pursuit. In a society under the forms of which the stronger faction can readily unite and oppress the weaker, anarchy may as truly be said to reign as in a state of nature, where the weaker individual is not secured against the violence of the stronger; and as, in the latter state, even the stronger individuals are prompted, by the uncertainty of their condition, to submit to a government which may protect the weak as well as themselves; so, in the former state, will the more powerful factions or parties be gradnally induced, by a like motive, to wish for a government which will protect all parties, the weaker as well as the more powerful.
Multiplicity as safety feature: Pluralism (many sects/interests) prevents dominance — parallel to ensemble robustness or redundant architectures that avoid catastrophic convergence.
→ Category: Safety (S) + Structural Norms (SIN)
Alignment Objective Statement: Defines the target function of the entire system — justice as the final value, not power or efficiency.
→ Category: Alignment Principle (AP) — specifying normative telos.
purpose of system is to serve humans' needs
It can be little doubted that if the State of Rhode Island was separated from the Confederacy and left to itself, the insecurity of rights under the popular form of government within such narrow limits would be displayed by such reiterated oppressions of factious majorities that some power altogether independent of the people would soon be called for by the voice of the very factions whose misrule had proved the necessity of it.
In the extended republic of the United States, and among the great variety of interests, parties, and sects which it embraces, a coalition of a majority of the whole society could seldom take place on any other principles than those of justice and the general good; whilst there being thus less danger to a minor from the will of a major party, there must be less pretext, also, to provide for the security of the former, by introducing into the government a will not dependent on the latter, or, in other words, a will independent of the society itself. It is no less certain than it is important, notwithstanding the contrary opinions which have been entertained, that the larger the society, provided it lie within a practical sphere, the more duly capable it will be of self-government. And happily for the REPUBLICAN CAUSE, the practicable sphere may be carried to a very great extent, by a judicious modification and mixture of the FEDERAL PRINCIPLE.
PUBLIUS.
Scale and alignment stability: Small homogeneous systems are fragile; larger plural ones are more stable due to diffusion of power and interest — an early insight into scalability of alignment.
→ Category: Structural Norms (SIN) + Safety (S).
Scalable oversight principle: More agents and interdependence yield emergent stability — like federated governance or distributed supervision.
The premise of a democratic republic is that there exists something called the people: the demos, the public will, the public good. The demos' utility function, whether or not it can express it clearly, is supreme.
Madison’s task is to design an artificial intelligence - an institutional mind - that can be depended on to maximize the utility of the demos. The government must be strong enough to act decisively ("enable the government to control the governed") yet remain open to correction ("and in the next place oblige it to control itself").
Its legitimacy and continued operation depend on the consent of the governed, renewed through elections and other feedback loops. This dependence keeps it uncertain about whether its current policies will continue to be ratified. The system therefore preserves a degree of epistemic humility about the public’s will.
But we do not depend only on the people’s "off-switch." The constitutional design builds internal corrigibility, checks and balances that let the system self-correct even when external feedback is delayed or noisy. Each branch is sufficiently independent and insulated that it remains uncertain about the true utility functions of the others, as well as of the people themselves. Yet a mutual presumption of rationality incentivizes each to preserve its own openness to correction.
A government, or any of its subsystems, that becomes too certain of its own righteousness drifts toward tyranny. A government that acknowledges uncertainty about the people’s will maintains mechanisms that let the people safely interrupt, redirect, or replace it.
Jefferson’s lines in the Declaration of Independence make the underlying alignment logic explicit:
"...to secure these rights [life, liberty, and the pursuit of happiness], Governments are instituted among Men, deriving their
just powers from the consent of the governed; that whenever any Form of Government becomes destructive of these ends,
it is the Right of the People to alter or to abolish it..."
Democracy, then, is the ultimate alignment mechanism for human intelligences. The system we create to ensure our alignment is itself an artificial institution, a distributed intelligence partly administered by experts but continually grounded in public consent. At every level, the design assumes uncertainty about "the good" and uses that uncertainty to drive corrigibility, self-restraint, and adaptability.
PRE-CLASS
CLASS