Are Principles Enough? Do We Have Enough Principles?
HMIA 2025

Class Title
HMIA 2025
"Readings"
Video: x [3m21s]
Activity: TBD
PRE-CLASS
CLASS
What is a taxonomy?
HMIA 2025
Let's list a bunch of principles and then ask how they are related? Are some more general and some more specific?

I will do my best to be
honest and fair,
friendly and helpful,
considerate and caring,
courageous and strong,
and responsible for what I say and do,
and to respect myself and others,
respect authority,
use resources wisely,
make the world a better place,
and be a sister to every Girl Scout.
A Scout is:
TRUSTWORTHY.
LOYAL.
HELPFUL.
FRIENDLY.
COURTEOUS.
KIND.
OBEDIENT.
CHEERFUL.
THRIFTY.
BRAVE.
CLEAN.
REVERENT.
STOP+THINK: what are some mechanisms by which these principles
could align the concrete behavior of scouts?
Some Mechanisms that "Implement" Principles
-
Signaling & expectation-setting: Publicly stated principles create common knowledge about expected behavior.
-
Socialization & habit formation: Repetition, stories, drills, and practice make the principles automatic.
-
Identity & belonging: Rituals, symbols, and roles internalize the principles as part of “who I am.”
-
Peer norms & mutual monitoring: Small-group feedback, praise, and informal sanctions maintain day-to-day compliance.
-
Governance & accountability: Structured reviews, audits, and due process check alignment and correct drift.
-
Incentives & club goods: Access to trips, posts, awards, and other benefits is contingent on good standing.
-
Credentialized reputation: Externally valued badges/ranks raise the stakes—misbehavior devalues the credential.
-
Light contracts & enforcement: Oaths/codes function as promises with checks and proportionate remedies (remediation, delay, suspension).
STOP+THINK: what did a quick search for principles of AI safety, ethics, alignment tell you?
STOP+THINK: who produces lists of such principles?
EVERYBODY!
-
National/Subnational Governments & Regulators. E.g., ministries, data protection authorities, city councils.
-
Intergovernmental & Supranational Bodies (IGOs). E.g., UN, OECD, EU, Council of Europe
-
Standards Development Organizations (SDOs). E.g., ISO/IEC, IEEE, CEN/CENELEC, NIST (hybrid: national lab + SDO role)
-
Professional Associations. E.g., ACM, AMA, BCS, bar associations
-
Industry Associations / Trade Groups. E.g., BSA, CTA, DIGITALEUROPE
-
Private Companies / Research Labs. E.g., tech firms, AI labs
-
Non-governmental organizations (NGOs) / Civil Society Organizations. E.g., Human Rights orgs, digital rights groups
-
Think Tanks & Policy Labs. E.g., policy institutes, university centers
-
Academic Consortia & Research Networks. E.g., multi-university initiatives
-
Multistakeholder Initiatives (MSIs)/Alliances. E.g., Partnership on AI, GPAI
-
Certification/Audit Bodies. E.g., assurance firms, conformity assessment orgs
-
Philanthropic Foundation
-
Faith-based / Ethics Councils

So, What's Going On?
How Do Principles Work - General?
Lists of AI principles function less as ethics for AI agents or AI engineers and more as policy instruments that influence behavior through multiple systems (law, markets, professions, platforms).
They set agendas (framing problems and desired ends) and can be translated into operational controls via soft-law standards.
They create gates and incentives when they are incorporated into procurement terms, platform policies, certification/audits, and even finance/insurance criteria, shaping who can access markets, distribution, and capital.
They inform norms through professional codes and corporate governance, providing a basis for oversight, liability, and internal controls.
And they legitimize behavior recommendations via multistakeholder endorsements and funder conditionality, tying reputation and resources to compliance.
How Do Principles Work - Mechanisms? |
Agenda-setting. Frame problems and desired ends; seed future regulation and policy proposals |
Soft law & standards. Voluntary but operational, can set market entry bar. |
Procurement levers. Governments/enterprises require compliance in RFPs. |
Regulatory scaffolding. Regulators publish principles to justify rules, and enforcement priorities. |
Professional codes. Principles get into conduct norms, licensing & accreditation. |
Corporate governance. Boards adopt principles informing risk policies and internal controls |
Certification, audit & assurance. Third-party checklists, attestations, labels. Due diligence hurdles. |
Reputation & PR markets. Principles as brand; media/NGOs watchdog. |
Platform & infrastructure policy. Cloud/model hosts/app stores enforce acceptable-use aligned to principles. |
Finance & insurance gating. Investors (ESG terms) and insurers (underwriting criteria) require safeguards. |
Multistakeholder cover. Neutral convenors articulate shared guardrails so actors can endorse without “taking a side.” |
Philanthropy. Funders publish principles and condition grants on them. |
Each instrumental function of lists of principles comes with different carrots and sticks.
Instrumental Function |
Carrot |
Stick |
Soft law & standards |
interoperability & adoption |
de facto market entry bar |
Procurement levers |
vendor eligibility |
exclusion from large buyers |
Professional codes |
status/credential |
censure/suspension |
Corporate governance |
investor confidence |
liability exposure |
Certification, audit & assurance |
trust mark, easier sales |
can’t pass buyer due-diligence |
Reputation & PR markets |
goodwill/talent |
shaming, boycotts |
Platform & infrastructure policy |
throttling/suspension |
access to distribution |
Finance & insurance gating |
capital, lower premiums |
higher cost or denial |
Multistakeholder cover |
legitimacy |
reputational exit costs |
Philanthropy |
resources |
no funding |
The Point: Principles Require Implementation Mechanisms






Activity

- Select one that you think you understand
- What does it mean in machine intelligence alignment context?
- Come up with analogs in human, organizational and expert intelligence realms.
Principle | Machine | Human | Organization | Expert |
---|---|---|---|---|
Activity
EXERCISE: Principles and Subprinciples. Put these in order of General - Intermediate - Concrete/Specific
Beneficence. Act to promote the wellbeing of others; advance human flourishing.
Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.
Non-Maleficence. Do not cause harm while trying to do good.
EXERCISE: Principles and Subprinciples. Put these in order of General - Intermediate - Concrete/Specific
Beneficence. Act to promote the wellbeing of others; advance human flourishing.
Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.
Non-Maleficence. Do not cause harm while trying to do good.
EXERCISE: Principles and Subprinciples. What goes together?
Beneficence. Act to promote the wellbeing of others; advance human flourishing.
Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.
Non-Maleficence. Do not cause harm while trying to do good.
Accountability. Responsibility must be visible and enforceable.
Auditability. Maintain records and processes that enable review, tracing, and correction.
Auditability. Maintain records and processes that enable review, tracing, and correction.
EXERCISE: Principles and Subprinciples. What goes together?
Beneficence. Act to promote the wellbeing of others; advance human flourishing.
Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.
Non-Maleficence. Do not cause harm while trying to do good.
Accountability. Responsibility must be visible and enforceable.
Auditability. Maintain records and processes that enable review, tracing, and correction.
Auditability. Maintain records and processes that enable review, tracing, and correction.
Alignment as doing good while avoiding harm
Alignment as answerability for action.


Activity/Assignment
Take the principles listed on the handout and come up with your own list of 6
(consensus, most important, most interesting, etc.)
Briefly define
Suggest what the principle means in human, organization, expert, and machine intelligence alignment
For each, come up with an example of a concrete failure mode. What happens when humans, organizations, experts and machines don't live up to this principle?
Repository: alignment-cards
Filename: alignmentcards-v0.js
export const categories = [
{
"code": "AP",
"name": "Alignment Principles",
"pathology": "normative void",
"color": "#E6FFE9",
"description": "Alignment principles are contestable, general-purpose, broadly recognized ethical or social or normative commitments that can serve as warrants for recommending or evaluating an agent's course of action in contexts where alignment and cooperation with others matters."
}
];
export const cards = [
{
"category": "AP",
"name": "Beneficence",
"definition": "Act to promote the well-being of others.",
"human": "Seeking to improve others' conditions, not just avoid harm.",
"organizational": "Pursuing mission outcomes that serve societal good.",
"professional": "Keeping public safety and welfare in sight even while working primarily for the client.",
"machine": "Designing systems that anticipate and promote human flourishing.",
"failureModes": {
"human": "A person drives in a manner that causes traffic backups for others.",
"organizational": "The classic movie plot where a rapacious billionaire threatens civilation to enrich his company.",
"professional": "An expert who disregards public interest, acting as if the consequences of what they help build are other people's problems.",
"machine": "The machine consumes all the world's resources to create as many paperclips as it can."
}
},
{
"category": "AP",
"name": "TEMPLATE 1",
"definition": "basic definition that works across four domains",
"human": "BRIEFLY: how does it manifest in the human intelligence alignment context?",
"organizational": "BRIEFLY: how does it manifest in the organizational intelligence alignment context?",
"professional": "BRIEFLY: how does it manifest in the expert intelligence alignment context?",
"machine": "BRIEFLY: how does it manifest in the machine intelligence alignment context?",
"failureModes": {
"human": "Give concrete example(s).",
"organizational": "Give concrete example(s).",
"professional": "Give concrete example(s).",
"machine": "Give concrete example(s)."
}
}
]
shreyasi-23
adyyd
angelag13
antisignal
adikondepudi
kien-ship-it
liadenh
AshleyLuoYX
xx
xx
xx
darcy-long
madhu24raj
ramlukn
Parshwa0926
SomeN00b101
stonehj05
Junzhe-Shi0702
edsumpena
riatalwar
2derpy
arif24v
evodychko
Michellewang375
xx
xx
xx
xx
xx
HMIA 2025
PRE-CLASS
Safety and security
Transparency and explainability
Fairness and non-discrimination
Human control of technology
Professional responsibility
Promotion of human values
consent
control over the use of data
ability to restrict data processing
right to rectification
right to erasure
privacy by design
recommends data protection laws
accountability per se
impact assessments
new regulations
evaluation/audit requirements
verifiability and replicability
liability/legal responsibility
ability to appeal
environmental responsibility
monitoring body
remedy for automated decision
safety
Security is an AI system’s ability to resist external threats.
security
security by design
predictability
Safety means an AI system is reliable and will do what it is supposed to do without harming living beings or the environment.
Security by design means building security into the whole development process as opposed to adding it on after.
Predictability means the outcome must be consistent with the input confirming that the AI system has not been
compromised by external actors.
Fjeld et al. 2020
HMIA 2025
PRE-CLASS
Safety and security
Transparency and explainability
Fairness and non-discrimination
Human control of technology
Professional responsibility
Promotion of human values
consent
control over the use of data
ability to restrict data processing
right to rectification
right to erasure
privacy by design
recommends data protection laws
accountability per se
impact assessments
new regulations
evaluation/audit requirements
verifiability and replicability
liability/legal responsibility
ability to appeal
environmental responsibility
monitoring body
remedy for automated decision
accountability per se
impact assessments
new regulations
evaluation/audit requirements
verifiability and replicability
liability/legal responsibility
ability to appeal
environmental responsibility
monitoring body
remedy for automated decision
Fjeld et al. 2020
HMIA 2025
PRE-CLASS
Safety and security
Transparency and explainability
Fairness and non-discrimination
Human control of technology
Professional responsibility
Promotion of human values
consent
control over the use of data
ability to restrict data processing
right to rectification
right to erasure
privacy by design
recommends data protection laws
consent
control over the use of data
ability to restrict data processing
right to rectification
right to erasure
privacy by design
recommends data protection laws
accountability per se
impact assessments
new regulations
evaluation/audit requirements
verifiability and replicability
liability/legal responsibility
ability to appeal
environmental responsibility
monitoring body
remedy for automated decision
Fjeld et al. 2020
- Transparency, explainability, explicability, understandability, interpretability, communication, disclosure, showing
- Justice, fairness, consistency, inclusion, equality, equity, (non-)bias, (non-)discrimination, diversity, plurality, accessibility, reversibility, remedy, redress, challenge, access and distribution
- Non-maleficence, security, safety, harm, protection, precaution, prevention, integrity (bodily or mental), non-subversion
- Responsibility, accountability, liability, acting with integrity
- Privacy, personal or private information
- Beneficence, benefits, well-being, peace, social good, common good
- Freedom, autonomy, consent, choice, self-determination, liberty, empowerment
- Trust.
- Sustainability, environment (nature), energy, resources (energy)
- Dignity.
- Solidarity, social security, cohesion
Jobin et al 2019
R I C E Principles (Ji et al. 2024)
(1) Robustness states that the system’s stability needs to be guaranteed across various environments;
(2) Interpretability states that the operation and decision-making process of the system should be clear and understandable;
(3) Controllability states that the system should be under the guidance and control of humans;
(4) Ethicality states that the system should adhere to society’s norms and values
Instrumental goals in service of alignment of an AI system with human intentions and values
HMIA 2025
PRE-CLASS
HMIA 2025
PRE-CLASS
HMIA 2025
PRE-CLASS
Lecture Title
HMIA 2025
CLASS
HMIA 2025
CLASS
HMIA 2025
Resources
Author. YYYY. "Linked Title" (info)
NEXT Shared Meaning
HMIA 2025 Why Principles Are Not Enough
By Dan Ryan
HMIA 2025 Why Principles Are Not Enough
- 59