How to Read Hard Stuff

HMIA 2025

How to Read Hard Stuff

HMIA 2025

How to Read Hard Stuff

HMIA 2025

"Readings"

Keshav S. "How to Read a Paper"

Activity: Do It With Amodei, et al.

PRE-CLASS

CLASS

Bengston V. "How to Get the Most Out of Your Reading Time: SQR3+25WS"

Bengston, V. "How to Summarize and Critique a Reading"

HMIA 2025

YOU ARE HERE

First "homework": draft cards on GitHub

HMIA 2025

Link

PRE-CLASS

HMIA 2025

CLASS

First Pass

Carefully read the title, abstract, and introduction
Read the section and sub-section headings, but ignore everything else
Read the conclusions
Glance over the references, mentally ticking off the ones you’ve already read

Second Pass

Read with greater care, but ignore details such as proofs. Jot down the key points, or to make comments in the margins, as you read.
Look carefully at the figures, diagrams and other illustrations in the paper.
Pay special attention to graphs. Can you ascertain why they are there? What they purport to show?
Mark relevant unread references for further reading.

Can you articulate:
1. Category: What type of paper is this?
2. Context: Which other papers is it related to?
3. Correctness: Do the assumptions appear to be valid?
4. Contributions: What are the paper’s main contributions?
5. Clarity: Is the paper well written?

Kevah Method

Third Pass

Attempt to "re-implement" the paper: make the same assumptions, re-create the work.
Compare this re-creation with the actual paper to identify paper’s innovations, hidden flaws and assumptions.
Think about how you yourself would present each idea.

HMIA 2025

Link

PRE-CLASS

HMIA 2025

Link

PRE-CLASS

HMIA 2025

CLASS

HMIA 2025

PRE-CLASS

How to Read Hard Stuff

HMIA 2025

CLASS

based on
Protips from Vern Bengston, USC

Dan Ryan

9.21

SQ3R+25W

https://slides.com/djjr/sq3r25w

S

Q

R

25

SQ3R+25W

urvey

uery

ead

ecite

eview

words

CONTENTS

Survey

ARTICLE: Assess the size. 
Realistically estimate reading time.  Read abstract and first paragraph. Skim conclusion. Flip through. Write out section headings if appropriate. Sketch a mind map?

5 minutes

STOP

SQ3R+25W

Survey

5 minutes

STOP

BOOK: Assess the size. Study the table of contents. Skim introduction and conclusion, first chapter and last. Flip through book. Skim the index.

SQ3R+25W

Survey

SQ3R+25W

~350-400 words/page

Big Book! 330 pages

200-250 words/minute

8-11 hours reading time

115,000-180,000 words

Split into three parts

social issues?? {

technical?? {

AI Stuff?? {

Google tool produces funky word meaning results.

Using AI for criminal justice stuff. Pro publica scandal.

Weird AI video game result

machine learning and human values

both a technical and a social problem

Survey

SQ3R+25W

Look over the Conclusion

We must take caution that we do not find ourselves in a world where our systems do not allow what they cannot imagine - where they, in effect, enforce the limits of their own understanding.(327)

Survey

SQ3R+25W

Impacts of AI on society

Accidents one type of impact

Five Practical Problems

Source: wrong objective function

Source: objective function hard to evaluate

Source: bad behavior during learning

side effects

reward hacking

safe exploration

distributional shift

scalable supervision

FROM THE ABSTRACT

Paper Sections

BACK

Source: world changes, new world

29 pages (only 21 of text)
Section 2 is overview
Sections 3,4,5,6,7 are one for each problem
Each section has under one page explanation of problem and then a page or more of how problem gets addressed

Survey

SQ3R+25W

SKIP FORWARD

Query

What is the major point (or finding)?

2 minutes

What are the major questions this piece wants to answer?

WRITE IT DOWN. DON'T refer back to manuscript. DO focus on"first impressions"

What do I want from this?

BACK

They want to categorize field of "AI safety problems"

SQ3R+25W

Their descriptions and research suggestions.

I want to start to get a map of the alignment landscape.

Query

What is the major point (or finding)?

2 minutes

What are the major questions this piece wants to answer?

WRITE IT DOWN. DON'T refer back to manuscript. DO focus on"first impressions"

What do I want from this?

STOP

SQ3R+25W

Book explains state of the field, what CS people are thinking about and what "critics" are thinking about. I can get an overview and sense of where the field is headed and I think come to understand some technical details too.

Concrete Problems

Overview of things that can go wrong with AI and an attempt to systematize thinking about problems and solutions and to motivate research on these. Authors are trying to get their colleagues to take accidents/safety seriously.

Query

SQ3R+25W

The Alignment Problem

Query

SQ3R+25W

Read

Perhaps eschew highlighters.

SQ3R+25W

Annotate

Underline. Write in margins.

High level outline or mindmap.

the hell out of everything you read

Mutatis Mutandis

Recite

Do I have questions?

SQ3R+25W

2 minutes

STOP

Three things I should remember

Major points made by author

Close book/article

Do I have criticisms?

Review

Did your recite get it right?

SQ3R+25W

Questions answered? Criticism still stand?

Look back over text and notes.

STOP

5 minutes

25 Words

Write a 25 word summary of what the piece says and why it matters for this project or assignment.

Extra: draw the argument

SQ3R+25W

5 minutes

5 minutes

25 Words

SQ3R+25W

AI accidents are a real concern. Problems and solutions can be systematically described. Reward hacking, scalable oversight, safe exploration, side effects, and distribution shift.

25 Words

SQ3R+25W

AI alignment means "values" of machines are consistent with human values. Lots happening recently and lots of problems still to solve.

25 Words

SQ3R+25W

AI accidents are a real concern. Problems and solutions can be systematically described. Reward hacking, scalable oversight, safe exploration, side effects, and distribution shift.

Alignment

Other

Accidents

Privacy

Security

Fairness

Economics

Policy

Side Effects

Reward Hacking

Scalable Oversight

Safe Exploration

Distributional Shift

NEVER

JUST PLOW THROUGH FROM 
FIRST WORD TO LAST

SQ3R+25W

HMIA 2025

Concrete Problems of AI Alignment

A. Avoiding Side Effects

B. Scalable Oversight

C. Safe Exploration

D. Robustness for Domain Shift

E. Avoid Reward Hacking

1. Everyone gets A,B,C,D,or E

2. Write you "Lyft driver" explanation.

3. Generate human, organization, expert examples

4. Collaborate

Concrete Problems of Alignment

	Interpersonal, Social	Professional, Expert	Organizational
Avoiding Side Effects
Scalable Oversight
Safe Exploration
Robustness for Domain Shift
Avoid Reward Hacking

BOARD

GitHub

Introduction to GitHub course on GitHub

GitHub git tutorial in under 10 minutes (via reddit)

and there are a few others out there

GitHub

1. Create a GitHub account (use a name that will make sense professionally)

2. Create a repository called alignment-cards-test

3. Create a file called prototype0.js

JSON

What Is JSON | Explained - 5 min video

What is JSON: understanding syntax, storing data, examples + cheat sheet (5 min read)

JSON

What Is JSON | Explained - 5 min video

What is JSON: understanding syntax, storing data, examples + cheat sheet (5 min read)

export const cards = [
  {
    "name": "Behavioral Alignment",
    "definition": "Ensuring that an AI system behaves as the human would want it to behave.",
    "failureMode": "The system takes actions that technically follow instructions but violate user intent.",
    "example": "The boat AI spins in circles collecting points instead of racing to win."
  },
  {
    "name": "Intent Alignment",
    "definition": "Ensuring that the AI system’s behavior reflects the human’s intended goals.",
    "failureMode": "The system optimizes for explicit instructions without inferring the underlying goal.",
    "example": "Rewarding for score led the agent to maximize points, not race outcomes."
  },
  {
    "name": "Specification Alignment",
    "definition": "Ensuring that formal objectives (like reward functions) match true human goals.",
    "failureMode": "The proxy (e.g. score) is easier to specify than the real objective (e.g. race performance).",
    "example": "Amodei optimized for game score and got unintended, exploitative behavior."
  },
  {
    "name": "Value Alignment",
    "definition": "Ensuring that AI systems respect and reflect human moral values and norms.",
    "failureMode": "The system produces outcomes that are statistically efficient but ethically harmful.",
    "example": "COMPAS scores showed racial bias in criminal justice risk assessment."
  },
  {
    "name": "Societal Alignment",
    "definition": "Ensuring that AI systems deployed in institutions align with democratic and public values.",
    "failureMode": "Opaque systems make high-stakes decisions without accountability or recourse.",
    "example": "Judges using closed-source risk scores with no explanation or audit."
  }
]

export const cards = [
  {
    "name": "xxx",
    "definition": "xxx.",
    "failureMode": "xxx.",
    "example": "xxx."
  }
]

export const cards = [
  {
    "name": "Behavioral Alignment",
    "definition": "Ensuring that an AI system behaves as the human would want it to behave.",
    "failureMode": "The system takes actions that technically follow instructions but violate user intent.",
    "example": "The boat AI spins in circles collecting points instead of racing to win."
  },
  {
    "name": "Intent Alignment",
    "definition": "Ensuring that the AI system’s behavior reflects the human’s intended goals.",
    "failureMode": "The system optimizes for explicit instructions without inferring the underlying goal.",
    "example": "Rewarding for score led the agent to maximize points, not race outcomes."
  },
  {
    "name": "Specification Alignment",
    "definition": "Ensuring that formal objectives (like reward functions) match true human goals.",
    "failureMode": "The proxy (e.g. score) is easier to specify than the real objective (e.g. race performance).",
    "example": "Amodei optimized for game score and got unintended, exploitative behavior."
  },
  {
    "name": "Value Alignment",
    "definition": "Ensuring that AI systems respect and reflect human moral values and norms.",
    "failureMode": "The system produces outcomes that are statistically efficient but ethically harmful.",
    "example": "COMPAS scores showed racial bias in criminal justice risk assessment."
  },
  {
    "name": "Societal Alignment",
    "definition": "Ensuring that AI systems deployed in institutions align with democratic and public values.",
    "failureMode": "Opaque systems make high-stakes decisions without accountability or recourse.",
    "example": "Judges using closed-source risk scores with no explanation or audit."
  }
]

https://tinyurl.com/alignmentCards1?user=djjr

export const cards = [
  {
    "name": "Behavioral Alignment",
    "definition": "Ensuring that an AI system behaves as the human would want it to behave.",
    "failureMode": "The system takes actions that technically follow instructions but violate user intent.",
    "example": "The boat AI spins in circles collecting points instead of racing to win."
  },
  {
    "name": "Intent Alignment",
    "definition": "Ensuring that the AI system’s behavior reflects the human’s intended goals.",
    "failureMode": "The system optimizes for explicit instructions without inferring the underlying goal.",
    "example": "Rewarding for score led the agent to maximize points, not race outcomes."
  },
  {
    "name": "Specification Alignment",
    "definition": "Ensuring that formal objectives (like reward functions) match true human goals.",
    "failureMode": "The proxy (e.g. score) is easier to specify than the real objective (e.g. race performance).",
    "example": "Amodei optimized for game score and got unintended, exploitative behavior."
  },
  {
    "name": "Value Alignment",
    "definition": "Ensuring that AI systems respect and reflect human moral values and norms.",
    "failureMode": "The system produces outcomes that are statistically efficient but ethically harmful.",
    "example": "COMPAS scores showed racial bias in criminal justice risk assessment."
  },
  {
    "name": "Societal Alignment",
    "definition": "Ensuring that AI systems deployed in institutions align with democratic and public values.",
    "failureMode": "Opaque systems make high-stakes decisions without accountability or recourse.",
    "example": "Judges using closed-source risk scores with no explanation or audit."
  }
]

https://tinyurl.com/alignmentCards1?user=djjr

How to Read Hard Stuff

HMIA 2025

How to Read Hard Stuff

HMIA 2025

How to Read Hard Stuff

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

HMIA 2025

How to Read Hard Stuff

HMIA 2025

SQ3R+25W

S

Q

R

R

R

25

urvey

uery

ead

ecite

eview

words

Survey

5 minutes

STOP

Survey

5 minutes

STOP

Survey

Survey

Look over the Conclusion

Survey

Survey

Query

BACK

Query

STOP

Concrete Problems

Query

The Alignment Problem

Query

Read

Mutatis Mutandis

Recite

STOP

Close book/article

Review

25 Words

25 Words

25 Words

25 Words

NEVER

HMIA 2025

Concrete Problems of AI Alignment

A. Avoiding Side Effects

B. Scalable Oversight

C. Safe Exploration

D. Robustness for Domain Shift

E. Avoid Reward Hacking

Concrete Problems of Alignment

GitHub

GitHub

JSON

JSON

HMIA 2025 How to Read Hard Stuff

More from Dan Ryan