Observation
Question
Hypothesis
Testable Predictions
Gather data
Alter, Expand, Reject Hypothesis
Develop General Theories
[Figure adapted from ArchonMagnus]
High-dimensional data
Simulators as theory models
The Universe accelerates!
The Universe expands, it should decelerate
What is the ultimate fate of the Universe?
Need a repulsive dark energy component
Measure supernovae redshifts
Matter domination -> the Universe decelerates: rate?
Distance-redshift relation via standard candles
Simulations?
"Semantic" lower dimensional representation
[On the Opportunities and Risks of Foundation Models" Bommasani et al]
Simulated Data
Observed Data
Alignment Loss
Reconstruction
Alignment
(OT / Adversarial)
Shared Decoder
Observed Reconstructed
Simulated Reconstructed
Idealized Simulations
Observations
+ Scale Dependent Noise
+ Bump
["Disentangling Foundation Models for Science: Robust Integration of Simulated and Observed Data" Cuesta-Lazaro, Alvarez-Melis (in-prep)]
Amplitude
Tilt
Tilt
["Disentangling Foundation Models for Science: Robust Integration of Simulated and Observed Data" Cuesta-Lazaro, Alvarez-Melis (in-prep)]
Late Universe
Early Universe
Tension
Early vs Late
Parametric Extensions
[Image Credit: Prof. Wendy Freedman]
The missing pieces: Beyond parametric searches
Axion Dark Matter
Dark Matter - Baryon Interactions
Primordial Non-Gaussianity
Early Dark Energy
Dark Radiation
[Credit: Sandbox Studio]
[Credit: Sandbox Studio]
["CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching" Kannan, Qiu, Cuesta-Lazaro, Jeong]
Sid Kannan (UCSB)
["CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching" Kannan, Qiu, Cuesta-Lazaro, Jeong]
["Detecting Model Misspecification in Cosmology with Scale-Dependent Normalizing Flows" Akhmetzhanova, Cuesta-Lazaro, Mishra-Sharma]
Base
OOD Mock 1
OOD Mock 2
Large Scales
Small Scales
Small Scales
OOD Mock 1
OOD Mock 2
Parameter Inference Bias (Supervised)
OOD Metric (Unsupervised)
Large Scales
Small Scales
Aizhan Akhmetzhanova (Harvard)
Observation
Question
Hypothesis
Testable Predictions
Gather data
Alter, Expand, Reject Hypothesis
Develop General Theories
[Figure adapted from ArchonMagnus]
Simulators as theory models
High-dimensional data
["An LLM-driven framework for cosmological
model-building and exploration" Mudur, Cuesta-Lazaro, Toomey (in prep)]
Propose a model for Dark Energy
Implement it in a Cosmology simulation code: CLASS
Test fit to DESI Observations
Iterate to improve fit
Quintessence, DE/DM interactions....
Must pass a set of general tests for "reasonable" models
Ideally, compare evidence to LCDM.
For now, Bayesian Information Criteria (BIC)
1
2
Nayantara Mudur (Harvard)
Thawing Quintessence
Axion-like Early Dark Energy
Ultra-light scalar field that temporarily acts as dark energy in the early universe
Implementation Challenge:
Dynamic dark energy model: scalar field transitions from "frozen" (cosmological constant-like) to evolving as the universe expands.
Oscillatory behaviour
Can take advantage of existing scalar field implementations in CLASS
+ 43,000 lines of C code
+ 10,000 lines of numerical files
CLASS Challenge:
1) Code compiles + obtains reasonable observables
2) Implementation agrees with target repository
3) Goodness of fit for DESI + Supernovae
4) H0 tension metrics
Curated
1 page long description of model to be implemented, CLASS tips + very explicit units
Paper
Directly from a full paper
If fails, get feedback from another LLM
Shortcut: field that produces this?
Asked for physical motivation. It tried :(
Not true, preferred scale
Reinforcement Learning
Update the base model weights to optimize a scalar reward (s)
DeepSeek R1
Base LLM
(being updated)
What rewards are more advantageous?
Base LLM
(frozen)
Develop basic skills: numerics, theoretical physics, UNIT CONVERSION
Community Effort!
Evolutionary algorithms
Learning in natural language, reflect on traces and results
Examples: EvoPrompt, FunSearch,AlphaEvolve
["GEPA: Reflective prompt evolution can outperform reinforcement learning" Agrawal et al]
GEPA: Evolutionary
GRPO: RL
+10% improvement over RL with x35 less rollouts
Scientific reasoning with LLMs still in its infancy!
Observation
Question
Hypothesis
Testable Predictions
Gather data
Alter, Expand, Reject Hypothesis
Develop General Theories
[Figure adapted from ArchonMagnus]