["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]
Model Mispecification
Shared Information
Private Information
- Shared + Private
Simulation-Based Inference in Cosmology
True
Reconstructed
Idealized Simulations
Observations
+ Scale Dependent Noise
+ Bump
Representation Learning
Physics
Systematics
Amplitude
Tilt
Tilt
- Shared
- Private
[arXiv:2503.15312]
[https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-across-domains/]
Observation
Question
Hypothesis
Testable Predictions
Gather data
Alter, Expand, Reject Hypothesis
Develop General Theories
[Figure adapted from ArchonMagnus]
High-dimensional data
Simulators as theory models
The Universe accelerates!
The Universe expands, it should decelerate
What is the ultimate fate of the Universe?
Need a repulsive dark energy component
Measure supernovae redshifts
Matter domination -> the Universe decelerates: rate?
Distance-redshift relation via standard candles
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]
Dark Energy is constant over time
DM-DE Interactions
[arXiv: 2503.14743]
Phantom Crossing
Violates Null Energy Condition
[arXiv: 2503.16415]
Change in dark matter mass
["An LLM-driven framework for cosmological
model-building and exploration" Mudur, Cuesta-Lazaro, Toomey (in prep)]
Propose a model for Dark Energy
Implement it in a Cosmology simulation code: CLASS
Test fit to DESI Observations
Iterate to improve fit
Quintessence, DE/DM interactions....
Must pass a set of general tests for "reasonable" models
Ideally, compare evidence to LCDM.
For now, Bayesian Information Criteria (BIC)
1
2
Nayantara Mudur (Harvard)
Thawing Quintessence
Axion-like Early Dark Energy
Ultra-light scalar field that temporarily acts as dark energy in the early universe
Implementation Challenge:
Dynamic dark energy model: scalar field transitions from "frozen" (cosmological constant-like) to evolving as the universe expands.
Oscillatory behaviour
Can take advantage of existing scalar field implementations in CLASS
+ 43,000 lines of C code
+ 10,000 lines of numerical files
CLASS Challenge:
1) Code compiles + passes unit tests (reasonable observables, numerical convergence...)
2) Implementation agrees with target repository
3) Goodness of fit for DESI + Supernovae
4) H0 tension metrics
Curated
1 page long description of model to be implemented, CLASS tips + very explicit units
Paper
Directly from a full paper
If fails, get feedback from another LLM
Shortcut: field that produces this?
Asked for physical motivation. It tried :(
Not true, preferred scale
Reinforcement Learning
Update the base model weights to optimize a scalar reward (s)
DeepSeek R1
Base LLM
(being updated)
Base LLM
(frozen)
Develop basic skills: numerics, theoretical physics, experimentation...
Community Effort!
1. Design next Experiment
2. Hypothesize Equation of motion
3. Simulate and Compare
Evolutionary algorithms
Learning in natural language, reflect on traces and results
Examples: EvoPrompt, FunSearch,AlphaEvolve
["GEPA: Reflective prompt evolution can outperform reinforcement learning" Agrawal et al]
GEPA: Evolutionary
GRPO: RL
+10% improvement over RL with x35 less rollouts
Scientific reasoning with LLMs still in its infancy!
3. Science is ultimately a human endeavor, what questions are interesting to answer and may be solvable is up to us. What role can LLMs play in Science?
1. LLMs are improving on most subjects at an insane rate, including maths
What problems in physics can we tackle with automated code generation?
Can generally make simulators more controllable!
Artificial Muses
2. How do we improve their physics reasoning skills?
RL over simulated worlds
Science not so amenable to a "scalar reward" setup
"Play" is important