Park et al.
Applications:
Virtual worlds, metaverses
Prototyping tools
Creative partners
Prior agent approaches have limitations:
Rules-based behaviors
Reinforcement learning successes are limited
Cognitive architectures require manual encoding
Virtual agents and non-player characters
Cognitive architectures like SOAR, ACT-R
Interactive machine learning systems
Recent uses of LLMs for behavior modeling
Unique sprite-based avatar
Seed Memories
Communicate via natural language
End users issue commands
Environmental affordances
Navigation and movement
John Lin is a pharmacy shopkeeper at the Willow Market and Pharmacy who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers; John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory; John Lin loves his family very much; John Lin has known the old couple next-door, Sam Moore and Jennifer Moore, for a few years; John Lin thinks Sam Moore is a kind and nice man; ...
Perceive surrounding spaces
Navigate map and enter buildings
Influence state of objects
React to user changes in environment
Maintain representation of seen areas
Information diffusion
Forming relationships
Coordination of activities
Memory stream logs experiences over time
Retrieval identifies relevant memories
Reflection synthesizes insights
Planning ensures coherent long-term behavior
Natural language interaction
Comprehensive permanent log of all experiences
Natural language descriptions
Retrieval finds contextual memories
Recency, relevance, importance scores
Recency.ย Exponential decay function with decay factor = 0.995
Importance. Asking the language model
Relevance.ย Use language model embedding vector, cosine similarity of memory and query
Synthesizes lower-level memories
Generalizes experiences into abstractions
Derives higher-level insights
Enables reasoning beyond individual memories
Query the model with the 100 most recent records
Prompt the model to generate questions
Use generated questions to gather relevant memories
Prompt the model to extract insights and cite evidence
Store the statement as a reflection
Outlines coherent sequences of activities
Operates at multiple time granularities
Interleaves planning, acting, replanning
Adapts plans dynamically when needed
Keeps behavior believable long-term
Create a plan that outlines day's agenda using language model
prompting with summary description + previous day
Add the generated plan to memory
Recursively decompose it to create finer-grained actions
Name: Eddy Lin (age: 19) Innate traits: friendly, outgoing, hospitable Eddy Lin is a student at Oak Hill College studying music theory and composition. He loves to explore different musical styles and is always looking for ways to expand his knowledge. Eddy Lin is working on a composition project for his college class. He is taking classes to learn more about music theory. Eddy Lin is excited about the new composition he is working on but he wants to dedicate more hours in the day to work on it in the coming days On Tuesday February 12, Eddy 1) woke up and completed the morning routine at 7:00 am, [. . . ] 6) got ready to sleep around 10 pm. Today is Wednesday February 13. Here is Eddyโs plan today in broad strokes: 1)
...
Today is Wednesday February 13. Here is Eddyโs plan today in broad strokes: โ1) wake up and complete the morning routine at 8:00 am, 2) go to Oak Hill College to take classes starting 10:00 am, [. . . ] 5) work on his new music composition from 1:00 pm to 5:00 pm, 6) have dinner at 5:30 pm, 7) finish school assignments and go to bed by 11:00 pm.โ
work on his new music composition from 1:00 pm to 5:00 pm
1:00 pm: start by brainstorming some ideas for his music composition
[...]
4:00 pm: take a quick break and recharge his creative energy before reviewing and polishing his composition.
Conditions on memories of other agent
Retrieves context from past interactions
Maintains consistency in conversations
Grounding in relationship history
Context-appropriate responses
Built using Phaser game engine
Server tracks agent state changes
Actions translated between NL & engine
Users initialize agents with descriptions
Agents have partial env representations
Interview agents independently
5 question categories on capabilities
Self-knowledge: "Give an intro of yourself"
Memory: "Who is [name]?"
Plans: "What will you be doing at 10 am tomorrow?"
Reactions: "Your breakfast is burning! What would you do?"
Reflections: "If you were to spend time with one person you met recently, who would it be and why?"
Compare full architecture to ablations
Human-authored baseline for grounding
Ranked for believability by evaluators
Kruskal-Wallis: (๐ป (4) = 150.29, ๐ < 0.001)
Pairwise differences between conditions were significant (๐ < 0.001)
25 agents interact over 2 days
Analyze emergence of dynamics:
Information diffusion:
Sam's candidacy for village mayor, Isabella's Valentine's Day party
Percentage of agents holding the information at the end
Relationships forming:
"Do you know [name]?"
Undirected graph density: ๐ = 2 โ |๐ธ|/|๐|(|๐| โ 1)
Coordination of activities
# of agents who actually showed up to the party after hearing about it
Identify limitations:
Memory retrieval issues
Boundary condition behaviors
Implementation:
Retrieval module
Improve performance and make it more cost-effective
Parallelizing agents or developing specific LMs for generative agents
Evaluation:
Evaluations limited in scope and duration
Behavior complexity/diversity still low
Robustness of the agents
Any imperfection of LLMs are inherited
Risks around misinformation need addressing
Testing large-scale agent populations
Novel architecture for behavioral agents
Initial capabilities show promise
Much more research needed to realize vision
Helps define new subfield of generative agents