"Readings"
PRE-CLASS
CLASS
YOU ARE HERE
First "homework": draft cards on GitHub
PRE-CLASS
CLASS
First Pass
Second Pass
Can you articulate:
1. Category: What type of paper is this?
2. Context: Which other papers is it related to?
3. Correctness: Do the assumptions appear to be valid?
4. Contributions: What are the paper’s main contributions?
5. Clarity: Is the paper well written?
Kevah Method
Third Pass
PRE-CLASS
PRE-CLASS
CLASS
PRE-CLASS
CLASS
based on
Protips from Vern Bengston, USC
Dan Ryan
9.21
CONTENTS
ARTICLE: Assess the size. Realistically estimate reading time. Read abstract and first paragraph. Skim conclusion. Flip through. Write out section headings if appropriate. Sketch a mind map?
BOOK: Assess the size. Study the table of contents. Skim introduction and conclusion, first chapter and last. Flip through book. Skim the index.
~350-400 words/page
Big Book! 330 pages
200-250 words/minute
8-11 hours reading time
115,000-180,000 words
Split into three parts
social issues?? {
technical?? {
AI Stuff?? {
Google tool produces funky word meaning results.
Using AI for criminal justice stuff. Pro publica scandal.
Weird AI video game result
machine learning and human values
both a technical and a social problem
We must take caution that we do not find ourselves in a world where our systems do not allow what they cannot imagine - where they, in effect, enforce the limits of their own understanding.(327)
Impacts of AI on society
Accidents one type of impact
Five Practical Problems
Source: wrong objective function
Source: objective function hard to evaluate
Source: bad behavior during learning
side effects
reward hacking
safe exploration
distributional shift
scalable supervision
FROM THE ABSTRACT
Paper Sections
BACK
Source: world changes, new world
What is the major point (or finding)?
2 minutes
What are the major questions this piece wants to answer?
WRITE IT DOWN. DON'T refer back to manuscript. DO focus on"first impressions"
What do I want from this?
They want to categorize field of "AI safety problems"
Their descriptions and research suggestions.
I want to start to get a map of the alignment landscape.
What is the major point (or finding)?
2 minutes
What are the major questions this piece wants to answer?
WRITE IT DOWN. DON'T refer back to manuscript. DO focus on"first impressions"
What do I want from this?
Book explains state of the field, what CS people are thinking about and what "critics" are thinking about. I can get an overview and sense of where the field is headed and I think come to understand some technical details too.
Overview of things that can go wrong with AI and an attempt to systematize thinking about problems and solutions and to motivate research on these. Authors are trying to get their colleagues to take accidents/safety seriously.
Book explains state of the field, what CS people are thinking about and what "critics" are thinking about. I can get an overview and sense of where the field is headed and I think come to understand some technical details too.
Perhaps eschew highlighters.
Annotate
Underline. Write in margins.
High level outline or mindmap.
the hell out of everything you read
Do I have questions?
2 minutes
Three things I should remember
Major points made by author
Do I have criticisms?
Did your recite get it right?
Questions answered? Criticism still stand?
Look back over text and notes.
STOP
5 minutes
Write a 25 word summary of what the piece says and why it matters for this project or assignment.
Extra: draw the argument
5 minutes
5 minutes
AI accidents are a real concern. Problems and solutions can be systematically described. Reward hacking, scalable oversight, safe exploration, side effects, and distribution shift.
AI alignment means "values" of machines are consistent with human values. Lots happening recently and lots of problems still to solve.
AI accidents are a real concern. Problems and solutions can be systematically described. Reward hacking, scalable oversight, safe exploration, side effects, and distribution shift.
Alignment
Other
Accidents
Privacy
Security
Fairness
Economics
Policy
Side Effects
Reward Hacking
Scalable Oversight
Safe Exploration
Distributional Shift
JUST PLOW THROUGH FROM
FIRST WORD TO LAST
1. Everyone gets A,B,C,D,or E
2. Write you "Lyft driver" explanation.
3. Generate human, organization, expert examples
4. Collaborate
Interpersonal, Social | Professional, Expert | Organizational | |
---|---|---|---|
Avoiding Side Effects |
|||
Scalable Oversight |
|||
Safe Exploration |
|||
Robustness for Domain Shift |
|||
Avoid Reward Hacking |
BOARD
Introduction to GitHub course on GitHub
GitHub git tutorial in under 10 minutes (via reddit)
and there are a few others out there
1. Create a GitHub account (use a name that will make sense professionally)
2. Create a repository called alignment-cards-test
3. Create a file called prototype0.js
What is JSON: understanding syntax, storing data, examples + cheat sheet (5 min read)
What is JSON: understanding syntax, storing data, examples + cheat sheet (5 min read)
export const cards = [
{
"name": "Behavioral Alignment",
"definition": "Ensuring that an AI system behaves as the human would want it to behave.",
"failureMode": "The system takes actions that technically follow instructions but violate user intent.",
"example": "The boat AI spins in circles collecting points instead of racing to win."
},
{
"name": "Intent Alignment",
"definition": "Ensuring that the AI system’s behavior reflects the human’s intended goals.",
"failureMode": "The system optimizes for explicit instructions without inferring the underlying goal.",
"example": "Rewarding for score led the agent to maximize points, not race outcomes."
},
{
"name": "Specification Alignment",
"definition": "Ensuring that formal objectives (like reward functions) match true human goals.",
"failureMode": "The proxy (e.g. score) is easier to specify than the real objective (e.g. race performance).",
"example": "Amodei optimized for game score and got unintended, exploitative behavior."
},
{
"name": "Value Alignment",
"definition": "Ensuring that AI systems respect and reflect human moral values and norms.",
"failureMode": "The system produces outcomes that are statistically efficient but ethically harmful.",
"example": "COMPAS scores showed racial bias in criminal justice risk assessment."
},
{
"name": "Societal Alignment",
"definition": "Ensuring that AI systems deployed in institutions align with democratic and public values.",
"failureMode": "Opaque systems make high-stakes decisions without accountability or recourse.",
"example": "Judges using closed-source risk scores with no explanation or audit."
}
]
export const cards = [
{
"name": "xxx",
"definition": "xxx.",
"failureMode": "xxx.",
"example": "xxx."
}
]
export const cards = [
{
"name": "Behavioral Alignment",
"definition": "Ensuring that an AI system behaves as the human would want it to behave.",
"failureMode": "The system takes actions that technically follow instructions but violate user intent.",
"example": "The boat AI spins in circles collecting points instead of racing to win."
},
{
"name": "Intent Alignment",
"definition": "Ensuring that the AI system’s behavior reflects the human’s intended goals.",
"failureMode": "The system optimizes for explicit instructions without inferring the underlying goal.",
"example": "Rewarding for score led the agent to maximize points, not race outcomes."
},
{
"name": "Specification Alignment",
"definition": "Ensuring that formal objectives (like reward functions) match true human goals.",
"failureMode": "The proxy (e.g. score) is easier to specify than the real objective (e.g. race performance).",
"example": "Amodei optimized for game score and got unintended, exploitative behavior."
},
{
"name": "Value Alignment",
"definition": "Ensuring that AI systems respect and reflect human moral values and norms.",
"failureMode": "The system produces outcomes that are statistically efficient but ethically harmful.",
"example": "COMPAS scores showed racial bias in criminal justice risk assessment."
},
{
"name": "Societal Alignment",
"definition": "Ensuring that AI systems deployed in institutions align with democratic and public values.",
"failureMode": "Opaque systems make high-stakes decisions without accountability or recourse.",
"example": "Judges using closed-source risk scores with no explanation or audit."
}
]
https://tinyurl.com/alignmentCards1?user=djjr
export const cards = [
{
"name": "Behavioral Alignment",
"definition": "Ensuring that an AI system behaves as the human would want it to behave.",
"failureMode": "The system takes actions that technically follow instructions but violate user intent.",
"example": "The boat AI spins in circles collecting points instead of racing to win."
},
{
"name": "Intent Alignment",
"definition": "Ensuring that the AI system’s behavior reflects the human’s intended goals.",
"failureMode": "The system optimizes for explicit instructions without inferring the underlying goal.",
"example": "Rewarding for score led the agent to maximize points, not race outcomes."
},
{
"name": "Specification Alignment",
"definition": "Ensuring that formal objectives (like reward functions) match true human goals.",
"failureMode": "The proxy (e.g. score) is easier to specify than the real objective (e.g. race performance).",
"example": "Amodei optimized for game score and got unintended, exploitative behavior."
},
{
"name": "Value Alignment",
"definition": "Ensuring that AI systems respect and reflect human moral values and norms.",
"failureMode": "The system produces outcomes that are statistically efficient but ethically harmful.",
"example": "COMPAS scores showed racial bias in criminal justice risk assessment."
},
{
"name": "Societal Alignment",
"definition": "Ensuring that AI systems deployed in institutions align with democratic and public values.",
"failureMode": "Opaque systems make high-stakes decisions without accountability or recourse.",
"example": "Judges using closed-source risk scores with no explanation or audit."
}
]
https://tinyurl.com/alignmentCards1?user=djjr