The Ohio State University
~ 2 centers
~ 2 centers
Physical Sciences
~ 3 centers
7 centers x 15M ~ 100M
Environmental Sciences
Biological Sciences
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Rubin Observatory
The goal here is NOT just solving things faster that we can already solve,
but solving astrophysical problems that would otherwise be too complex to solve
My niece
Highly non-Gaussian
Weakly non-Gaussian
Cosmic large-scale structure
Data / Observation
Theory / Hypothesis
Analysis Pipelines
True
False
Data / Observation
Theory / Hypothesis
Analysis Pipelines
True
False
Alphafold
Data / Observation
Theory / Hypothesis
LamdaCDM
True
False
Data
Theory
State of the research
Making "plans"
Making "hypotheses"
Can A.I. agents understand spectral data (spectral energy distribution) from JWST?
A default fit with
an SED model
Extinction model ?
Young stellar population?
* In the classic tale of Faust, Mephisto is a demon who tempts the scholar Faust with knowledge and power in exchange for his soul.
Proposing actions
Execute actions
State evolution
Knowledge distillation
Proposing actions
Execute actions
State evolution
Knowledge distillation
Knowledge base
1
2
3
4
Proposing Actions - e.g., different physical models / parameter range
Knowledge base
1
2
3
4
Execute Actions - write configuration files, run the codes, automously
Knowledge base
1
2
3
4
State Evaluation - evaluate the results (beyond a single error metric)
Knowledge base
1
2
3
4
Knowledge Distillation - summarise useful actions given the previous state
" If the fit is overestimated in the UV and optical bands,
increasing the E_BV_lines parameter may lead to a better fit by accounting for more dust attenuation in these bands. "
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
GPT-4o baseline --
"think without knowledge"
Chi-Square of the Fit
Fitting JWST JADES data
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
GPT-4o baseline --
"think without knowledge"
Mephisto
Chi-Square of the Fit
With COSMOS2020 SEDs
Mephisto finds better solutions using only 1% of the trials that brute force methods require
Wavelength [micron]
Flux
Learn from the data
Summarize "knowledge"
Examine and include prior knowledge
Expedite discovery
Use the learned knowledge as context
https://tingyuansen.github.io/NASA_AI_ML_STIG
Next Monday (4pm ET)
Graduate student
Princeton Language and Intelligence Lab, June 2024
Human accuracy: ~80%
GPT-4o: ~47%
ARC Prize Foundation (ARC-AGI-2, 2025)
Human Panel : ~ 100%
GPT-5 : ~10%
Complex calculations
Logical inference (?)
Memorizing information
Language
Coding
Spatial reasoning
Common sense physics
(water flows downhill)
Basic motor skills
Visual reasoning
Understanding context
AI is still 20-50 points worse than humans
Brute force fine-tuning can close the gap in simple descriptive tasks, but not in visual reasoning tasks
Cosmology
Galaxy
High-energy
Sun/Star
Exoplanet
Simulation
Instrument
AI/Stat
Cosmology
Galaxy
High
-energy
Star
Planet
Sims
Instru.
AI/Stats
Sun/Star
Applications of AI in Stats
e.g., GPT-5
In the SED case study, we need ~0.1M tokens per source
= USD 1 per source ...
e.g., Roman Space Telescope, Euclid Space Telescope
Natural Language Processing experts
Oak Ridge
National Lab
Argonne
National Lab
Harvard-Smithsonian ADS
U. Ilinois
Urbana-Champaign
Knowledge Recall
What is the primary reason for the decline in the number density of luminous quasars at redshifts greater than 5?
A decrease in the overall star formation rate, leading to fewer potential host galaxies for quasars.
An increase in the neutral hydrogen fraction in the intergalactic medium, which obscures the quasars’ light.
A decrease in the number of massive black hole seeds that can form and grow into supermassive black holes.
An increase in the average metallicity of the Universe, leading to a decrease in the efficiency of black hole accretion.
Score (%)
Cost per 1 SED Source (USD)
Domain experts ~67% (20 points below AI)
AstroSage-8B
(de Haan, YST+ 2025a)
AstroSage-70B
(de Haan, YST+ 2025b)
JWST SED Fitting
Making "plans"
Making "hypotheses"
Annotated
Labelled Data
Unlabelled Data
Interacting with "physical" world
" If there is a gross underestimation in the MWIR bands,
consider exploring a wider range of fracAGN values in the agn module to improve the fit in these bands "
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
Chi-Square
Chi-Square of the Fit
Why this plateau ??
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
Chi-Square
Chi-Square of the Fit
- Number of photometry bands fitted within 1σ
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
Chi-Square
Chi-Square of the Fit
- Number of photometry bands fitted within 1σ
"Exploration"
"Exploitation"
300,000 papers
Mistral 7b
1,000,000 concepts
Mistral 7b
Concept merging and pruning
spectra
= spectroscopy
= spectral analysis
Too granular
Year
2000
2005
2010
2015
2020
7
9
11
10
8
Count [thousands]
Scientific concepts
Year
2000
2005
2010
2015
2020
1.5
Count [thousands]
Numerical simulation
1.2
0.9
0.6
0.3
Statistics
Year
2000
2005
2010
2015
2020
1.5
Count [thousands]
1.2
0.9
0.6
0.3
Machine learning
Linear Regression,
Gaussian Process, Random Forest, ......
152
210
230
Concept
Paper
Ting et al.
Contain
Einstein et al.
Contain
Contain
citation
Concept B:
Plasmon
Concept A:
Dark Matter
Concept A:
Dark Matter
Concept
Concept B:
Plasmon
Distance between concept A to B =
Paper
averaged over all papers containing concept A
Concept
Paper
Technical concept:
Neural Networks
Scientific concept: Large-Scale Structure
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
Numerical simulation
x scientific concepts
Technology development
Concept
Paper
Scientific Concept: Large-Scale Structure
Numerical Simulations
Simulations being developed
Linkage
decoupled
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
Numerical simulation
x scientific concepts
Technology deployment
Technology development
Concept
Paper
Scientific Concept: Large-Scale Structure
Numerical Simulations
Simulations being deployed to sciences
Linkage increases
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
Numerical simulation
x scientific concepts
N-body
simulation
Hydrodynamical simulation
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
ML x Scientific concepts
Gaussian process
multi-layer perceptron
Worse than GPT-4o
Score (%)
Cost per 1 SED Source (USD)
Cheaper but
not as good
Can vary by three order of magnitude in "value"!
domain experts
July 2024
Confidence (%)
Fraction of Correct Answer (%)
50
60
70
80
90
50
60
70
80
90
50
60
70
80
90
100
Under-confident
Over-confident
Model pre-summer 2024
After summer 2024
Score (%)
60
70
80
90
As of July 2024
LLaMA-3.1 70b throughput on four H100 GPUs
= ~ 100 tokens / second
1 SED source = 15 GPU minutes
1B sources = 10M GPU days
A cluster with 10,000 H100 GPUs
running for 3 years
= 0.03 USD
= 40 USD
Compute Power
Year
CPU Moore's Law is plateauing
GPU is
picking up the pace
The price drop has an e-folding time of appromately
3 months
Score (%)
Cost per 1 SED Source (USD)
< July 2024
Score (%)
Cost per 1 SED Source (USD)
< July 2024
Score (%)
Cost per 1 SED Source (USD)
+ 3 months
Google Gemma-2
Google
Gemini-1.5
Open-Weight
Proprietary
DeepSeek v2
Score (%)
Cost per 1 SED Source (USD)
Alibaba Qwen-2.5
Open-Weight
Proprietary
Meta LLaMA 3
+ 3 months
Yi 01
X's Grok
Stepfun
Microsoft
Phi-3.5
Nvidia's Nemotron
Score (%)
Cost per 1 SED Source (USD)
Open-Weight
Proprietary
+ 3 months
+ 3 months
Proprietary
(Experimental / Not Released)
DeepSeek v3 / R1
Score (%)
Cost per 1 SED Source (USD)
Open-Weight
Proprietary
+ 3 months
+ 3 months
Proprietary
(Experimental / Not Released)
OpenAI (o3)
Google Gemini-2.0
Score (%)
Cost per 1 SED Source (USD)
Open-Weight
Proprietary
+ 3 months
+ 3 months
Proprietary
(Experimental / Not Released)
Microsoft
Phi-4
MiniMax 01
Gemini-2.5-Pro
Claude-3.7-Sonnet
Meta LLaMA 4
e.g., Roman Space Telescope, Euclid Space Telescope
Data-poor , Theory-rich
Collecting
more data
???
Data-poor , Theory-rich
Data-rich , Theory-poor
Roman, HSC, Euclid, DESI, SDSS, PFS
Data-poor , Theory-rich