Are Principles Enough? Do We Have Enough Principles?

HMIA 2025

Class Title

HMIA 2025

"Readings"

Video: x [3m21s]

Activity: TBD

PRE-CLASS

CLASS

What is a taxonomy?

HMIA 2025

Let's list a bunch of principles and then ask how they are related? Are some more general and some more specific?

I will do my best to be
honest and fair,
friendly and helpful,
considerate and caring,
courageous and strong,
and responsible for what I say and do,
and to respect myself and others,
respect authority,
use resources wisely,
make the world a better place,
and be a sister to every Girl Scout.

A Scout is:

TRUSTWORTHY.

LOYAL.

HELPFUL.

FRIENDLY.

COURTEOUS.

KIND.

OBEDIENT.

CHEERFUL.

THRIFTY.

BRAVE.

CLEAN.

REVERENT.

STOP+THINK: what are some mechanisms by which these principles
could align the concrete behavior of scouts?

Some Mechanisms that "Implement" Principles

  • Signaling & expectation-setting: Publicly stated principles create common knowledge about expected behavior.

  • Socialization & habit formation: Repetition, stories, drills, and practice make the principles automatic.

  • Identity & belonging: Rituals, symbols, and roles internalize the principles as part of “who I am.”

  • Peer norms & mutual monitoring: Small-group feedback, praise, and informal sanctions maintain day-to-day compliance.

  • Governance & accountability: Structured reviews, audits, and due process check alignment and correct drift.

  • Incentives & club goods: Access to trips, posts, awards, and other benefits is contingent on good standing.

  • Credentialized reputation: Externally valued badges/ranks raise the stakes—misbehavior devalues the credential.

  • Light contracts & enforcement: Oaths/codes function as promises with checks and proportionate remedies (remediation, delay, suspension).

STOP+THINK: what did a quick search for principles of AI safety, ethics, alignment tell you?

STOP+THINK: who produces lists of such principles?

EVERYBODY!

  • National/Subnational Governments & Regulators. E.g., ministries, data protection authorities, city councils.  

  • Intergovernmental & Supranational Bodies (IGOs). E.g., UN, OECD, EU, Council of Europe

  • Standards Development Organizations (SDOs). E.g., ISO/IEC, IEEE, CEN/CENELEC, NIST (hybrid: national lab + SDO role)

  • Professional Associations. E.g., ACM, AMA, BCS, bar associations

  • Industry Associations / Trade Groups. E.g., BSA, CTA, DIGITALEUROPE

  • Private Companies / Research Labs. E.g., tech firms, AI labs

  • Non-governmental organizations (NGOs) / Civil Society Organizations. E.g., Human Rights orgs, digital rights groups

  • Think Tanks & Policy Labs. E.g., policy institutes, university centers

  • Academic Consortia & Research Networks. E.g., multi-university initiatives

  • Multistakeholder Initiatives (MSIs)/Alliances. E.g., Partnership on AI, GPAI

  • Certification/Audit Bodies. E.g., assurance firms, conformity assessment orgs

  • Philanthropic Foundation

  • Faith-based / Ethics Councils

So, What's Going On?

How Do Principles Work - General?


Lists of AI principles function less as ethics for AI agents or AI engineers and more as policy instruments that influence behavior through multiple systems (law, markets, professions, platforms).

 

They set agendas (framing problems and desired ends) and can be translated into operational controls via soft-law standards.

 

They create gates and incentives when they are incorporated into procurement terms, platform policies, certification/audits, and even finance/insurance criteria, shaping who can access markets, distribution, and capital.

 

They inform norms through professional codes and corporate governance, providing a basis for oversight, liability, and internal controls.

 

And they legitimize behavior recommendations via multistakeholder endorsements and funder conditionality, tying reputation and resources to compliance.

How Do Principles Work - Mechanisms?

Agenda-setting. Frame problems and desired ends; seed future regulation and policy proposals

Soft law & standards. Voluntary but operational, can set market entry bar.

Procurement levers. Governments/enterprises require compliance in RFPs.

Regulatory scaffolding. Regulators publish principles to justify rules, and enforcement priorities.

Professional codes. Principles get into conduct norms, licensing & accreditation.

Corporate governance. Boards adopt principles informing  risk policies and internal controls

Certification, audit & assurance. Third-party checklists, attestations, labels. Due diligence hurdles.

Reputation & PR markets. Principles as brand; media/NGOs watchdog.

Platform & infrastructure policy. Cloud/model hosts/app stores enforce acceptable-use aligned to principles.

Finance & insurance gating. Investors (ESG terms) and insurers (underwriting criteria) require safeguards.

Multistakeholder cover. Neutral convenors articulate shared guardrails so actors  can endorse without “taking a side.”

Philanthropy. Funders publish principles and condition grants on them.

Each instrumental function of lists of principles comes with different carrots and sticks.

Instrumental Function

Carrot

Stick

Soft law & standards

interoperability & adoption

de facto market entry bar

Procurement levers

vendor eligibility

exclusion from large buyers

Professional codes

status/credential

censure/suspension

Corporate governance

investor confidence

liability exposure

Certification, audit & assurance

trust mark, easier sales

can’t pass buyer due-diligence

Reputation & PR markets

goodwill/talent

shaming, boycotts

Platform & infrastructure policy

throttling/suspension

access to distribution

Finance & insurance gating

capital, lower premiums

higher cost or denial

Multistakeholder cover

legitimacy

reputational exit costs

Philanthropy

resources

no funding


 

The Point: Principles Require Implementation Mechanisms

Activity

  1. Select one that you think you understand
  2. What does it mean in machine intelligence alignment context?
  3. Come up with analogs in human, organizational and expert intelligence realms.
Principle Machine Human Organization Expert

Activity

EXERCISE: Principles and Subprinciples. Put these in order of General - Intermediate - Concrete/Specific

Beneficence. Act to promote the wellbeing of others; advance human flourishing.

Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.​

Non-Maleficence. Do not cause harm while trying to do good.

EXERCISE: Principles and Subprinciples. Put these in order of General - Intermediate - Concrete/Specific

Beneficence. Act to promote the wellbeing of others; advance human flourishing.

Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.​

Non-Maleficence. Do not cause harm while trying to do good.

EXERCISE: Principles and Subprinciples. What goes together?

Beneficence. Act to promote the wellbeing of others; advance human flourishing.

Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.​

Non-Maleficence. Do not cause harm while trying to do good.

Accountability. Responsibility must be visible and enforceable.

Auditability. Maintain records and processes that enable review, tracing, and correction.

Auditability. Maintain records and processes that enable review, tracing, and correction.​

EXERCISE: Principles and Subprinciples. What goes together?

Beneficence. Act to promote the wellbeing of others; advance human flourishing.

Safety & Robustness
Design systems that minimize risk, resist failure, and ensure benign outcomes even under error.​

Non-Maleficence. Do not cause harm while trying to do good.

Accountability. Responsibility must be visible and enforceable.

Auditability. Maintain records and processes that enable review, tracing, and correction.

Auditability. Maintain records and processes that enable review, tracing, and correction.​

Alignment as doing good while avoiding harm

Alignment as answerability for action.

Activity/Assignment

Take the principles listed on the handout and come up with your own list of 6

(consensus, most important, most interesting, etc.)

Briefly define

Suggest what the principle means in human, organization, expert, and machine intelligence alignment

For each, come up with an example of a concrete failure mode. What happens when humans, organizations, experts and machines don't live up to this principle?

Repository: alignment-cards

Filename: alignmentcards-v0.js

 export const categories = [

  {
    "code": "AP", 
    "name": "Alignment Principles", 
    "pathology": "normative void", 
    "color": "#E6FFE9",
    "description": "Alignment principles are contestable, general-purpose, broadly recognized ethical or social or normative commitments that can serve as warrants for recommending or evaluating an agent's course of action in contexts where alignment and cooperation with others matters."
  }
];


 export const cards = [

  {
    "category": "AP",
    "name": "Beneficence",
    "definition": "Act to promote the well-being of others.",
    "human": "Seeking to improve others' conditions, not just avoid harm.",
    "organizational": "Pursuing mission outcomes that serve societal good.",
    "professional": "Keeping public safety and welfare in sight even while working primarily for the client.",
    "machine": "Designing systems that anticipate and promote human flourishing.",
    "failureModes": {
      "human": "A person drives in a manner that causes traffic backups for others.",
      "organizational": "The classic movie plot where a rapacious billionaire threatens civilation to enrich his company.",
      "professional": "An expert who disregards public interest, acting as if the consequences of what they help build are other people's problems.",
      "machine": "The machine consumes all the world's resources to create as many paperclips as it can."
    }
    },
    { 
      "category": "AP", 
      "name": "TEMPLATE 1", 
      "definition": "basic definition that works across four domains", 
      "human": "BRIEFLY: how does it manifest in the human intelligence alignment context?", 
      "organizational": "BRIEFLY: how does it manifest in the organizational intelligence alignment context?", 
      "professional": "BRIEFLY: how does it manifest in the expert intelligence alignment context?", 
      "machine": "BRIEFLY: how does it manifest in the machine intelligence alignment context?", 
      "failureModes": { 
        "human": "Give concrete example(s).", 
        "organizational": "Give concrete example(s).", 
        "professional": "Give concrete example(s).", 
        "machine": "Give concrete example(s)."
      }
    }
    ]

shreyasi-23

adyyd

angelag13

antisignal

adikondepudi

kien-ship-it

liadenh

AshleyLuoYX

xx

xx

xx

darcy-long

madhu24raj

ramlukn

Parshwa0926

SomeN00b101

stonehj05

Junzhe-Shi0702

edsumpena

riatalwar

2derpy

arif24v

evodychko

Michellewang375

xx

xx

xx

xx

xx

HMIA 2025

PRE-CLASS

Safety and security

Transparency and explainability

Fairness and non-discrimination

Human control of technology

Professional responsibility

Promotion of human values

consent

control over the use of data

ability to restrict data processing

right to rectification

right to erasure

privacy by design

recommends data protection laws

accountability per se

impact assessments

new regulations

evaluation/audit requirements

verifiability and replicability

liability/legal responsibility

ability to appeal

environmental responsibility

monitoring body

remedy for automated decision

safety

Security is an AI system’s ability to resist external threats.

security

security by design

predictability

Safety means an AI system is reliable and will do what it is supposed to do without harming living beings or the environment.

Security by design means building security into the whole development process as opposed to adding it on after.

Predictability means the outcome must be consistent with the input confirming that the AI system has not been
compromised by external actors.
 

Fjeld et al. 2020

HMIA 2025

PRE-CLASS

Safety and security

Transparency and explainability

Fairness and non-discrimination

Human control of technology

Professional responsibility

Promotion of human values

consent

control over the use of data

ability to restrict data processing

right to rectification

right to erasure

privacy by design

recommends data protection laws

accountability per se

impact assessments

new regulations

evaluation/audit requirements

verifiability and replicability

liability/legal responsibility

ability to appeal

environmental responsibility

monitoring body

remedy for automated decision

accountability per se

impact assessments

new regulations

evaluation/audit requirements

verifiability and replicability

liability/legal responsibility

ability to appeal

environmental responsibility

monitoring body

remedy for automated decision

Fjeld et al. 2020

HMIA 2025

PRE-CLASS

Safety and security

Transparency and explainability

Fairness and non-discrimination

Human control of technology

Professional responsibility

Promotion of human values

consent

control over the use of data

ability to restrict data processing

right to rectification

right to erasure

privacy by design

recommends data protection laws

consent

control over the use of data

ability to restrict data processing

right to rectification

right to erasure

privacy by design

recommends data protection laws

accountability per se

impact assessments

new regulations

evaluation/audit requirements

verifiability and replicability

liability/legal responsibility

ability to appeal

environmental responsibility

monitoring body

remedy for automated decision

Fjeld et al. 2020

  1. Transparency, explainability, explicability, understandability, interpretability, communication, disclosure, showing
  2. Justice, fairness, consistency, inclusion, equality, equity, (non-)bias, (non-)discrimination, diversity, plurality, accessibility, reversibility, remedy, redress, challenge, access and distribution
  3. Non-maleficence, security, safety, harm, protection, precaution, prevention, integrity (bodily or mental), non-subversion
  4. Responsibility, accountability, liability, acting with integrity
  5. Privacy, personal or private information
  6. Beneficence, benefits, well-being, peace, social good, common good
  7. Freedom, autonomy, consent, choice, self-determination, liberty, empowerment
  8. Trust.
  9. Sustainability, environment (nature), energy, resources (energy)
  10. Dignity.
  11. Solidarity, social security, cohesion

Jobin et al 2019

R I C E Principles (Ji et al. 2024)

(1) Robustness states that the system’s stability needs to be guaranteed across various environments;

(2) Interpretability states that the operation and decision-making process of the system should be clear and understandable;

(3) Controllability states that the system should be under the guidance and control of humans;

(4) Ethicality states that the system should adhere to society’s norms and values

Instrumental goals in service of alignment of an AI system with human intentions and values

HMIA 2025

PRE-CLASS

HMIA 2025

PRE-CLASS

HMIA 2025

PRE-CLASS

Lecture Title

HMIA 2025

CLASS

HMIA 2025

CLASS

HMIA 2025

Resources

Author. YYYY. "Linked Title" (info)

HMIA 2025 Why Principles Are Not Enough

By Dan Ryan

HMIA 2025 Why Principles Are Not Enough

  • 59