Yuan-Sen Ting (丁源森）

The Ohio State University

Expediting Discoveries in Astronomy with A.I. Agents

NSF awarded over $200 million for AI Research Institutes

~ 2 centers

Physical Sciences

~ 3 centers

7 centers x 15M ~ 100M

Environmental Sciences

Biological Sciences

Hype, myth, or real deal?

Why hasn't astronomy had its
"AlphaFold" moment yet?"

YST, Annual Review of Astronomy and Astrophysics, arXiv: 2510.10713

Most AI in Astronomy focuses on extending statistical methods

0.9

0.8

0.7

0.25

0.30

0.35

0.40

\Omega_M

\sigma_8

Dark Matter Density

Growth Amplitude

E.g.,
simulation-based
inferences

Sihao Cheng, YST+, 2020

Applying A.I. to individual tasks
will have limited impacts in astrophysics

The complexity of astronomy is too low for AI

My niece

Highly non-Gaussian

Weakly non-Gaussian

Cosmic large-scale structure

Astronomy is not biology

Data / Observation

Theory / Hypothesis

Analysis Pipelines

True

False

Biology faced fundamental bottlenecks from individual tasks

Data / Observation

Theory / Hypothesis

Analysis Pipelines

True

False

Alphafold

Most astronomical tasks already have working heuristics

Data / Observation

Theory / Hypothesis

LamdaCDM

True

False

Toward Agentic Research for Astronomy

Data

Theory

State of the research

Making "plans"

Harness reasoning

Beyond just individual task optimizations

A.I. in Math Olympiads

A.I. in Astronomy Olympiads

Pinheiro, ..., YST+, 2025

In open-world setting, can large language models match human researchers at expediting
data explorations?

??

Can A.I. agents understand spectral data (spectral energy distribution) from JWST?

Real-world reasoning extends far beyond algorithmic formalism

A default fit with
an SED model

Extinction model ?

Real-world reasoning extends far beyond algorithmic formalism

Young stellar population?

Real-world reasoning extends far beyond algorithmic formalism

Many real-world problems aren't simple optimization problems

The objective goes beyond minimizing a single error metric.

Many tasks may require modifying assumptions / physical models, not just optimizing over all parameters

Action spaces are vast and hard to parameterize.

Can a large-language model learn
from its own experience?

Human "intuition" + experience

Introducing *Mephisto**

* In the classic tale of Faust, Mephisto is a demon who tempts the scholar Faust with knowledge and power in exchange for his soul.

A collaboration of multiple AI agents (LLM models)

Proposing actions

Execute actions

State evolution

Knowledge distillation

A collaboration of multiple AI agents (LLM models)

Proposing actions

Execute actions

State evolution

Knowledge distillation

Enabling AI to collect "knowledge" through exploration

Knowledge base

Proposing Actions - e.g., different physical models / parameter range

Enabling AI to collect "knowledge" through exploration

Knowledge base

Execute Actions - write configuration files, run the codes, automously

Enabling AI to collect "knowledge" through exploration

Knowledge base

vs.

State Evaluation - evaluate the results (beyond a single error metric)

Enabling AI to collect "knowledge" through exploration

Knowledge base

vs.

Knowledge Distillation - summarise useful actions given the previous state

Mephisto - deployed as "walkers" in the action space

Number of Learning Iterations

5.1

5.6

6.0

6.4

GPT-4o baseline --
"think without knowledge"

Chi-Square of the Fit

LLMs with self-improvement outperforms native LLMs

Fitting JWST JADES data

Sun, YST+, 2024

Number of Learning Iterations

5.1

5.6

6.0

6.4

GPT-4o baseline --
"think without knowledge"

Mephisto

Chi-Square of the Fit

LLMs with self-improvement outperforms native LLMs

Sun, YST+, 2024

Example of learned "knowledge"

" If the fit is overestimated in the UV and optical bands,

increasing the E_BV_lines parameter may lead to a better fit by accounting for more dust attenuation in these bands. "

Sun, YST+, 2025

Mephisto operates as walkers exploring the "hypothesis space"

With COSMOS2020 SEDs

Mephisto finds better solutions using only 1% of the trials that brute force methods require

https://tingyuansen.github.io/NASA_AI_ML_STIG/

YST+, 2025

Fitting equivalent widths used to require human judgments

YST+, 2025

E.g., deciding whether there's an unresolved blend of lines

YST+, 2025

E.g., adjusting for the continuum

YST+, 2025

What took a trained postdoc six months now costs ~$100 with agents

Liu, YST+ 2024

Zooniverse.org

Agents can sift through hundreds of millions of ASAS-SN
light curves and reason their way to interesting outliers

Pesta & YST, in prep.

Phase

-0.5

-0.25

0.25

0.5

-0.5

-0.25

0.25

0.5

Magnitude

Caught in the brief, unstable evolutionary semi-detached phase

A rare alignment of
a massive Supergiant
in a 13-year orbit

P=13 years

P=2.3 days

Graduate student / Postdoc

The Plot Twist

A.I. still struggles with many tasks that are easy for humans

Princeton Language and Intelligence Lab, June 2024

Human accuracy: ~80%

GPT-4o: ~47%

Can A.I. reason about scientific charts?

ARC Prize Foundation (ARC-AGI-2, 2025)

Spatial Pattern Reasoning

Human Panel : ~ 100%

GPT-5 : ~10%

Moravec's Paradox (1988)

- Things that seem easy for humans might be hard for computer, and vice versa

Reversing the evolution of "intelligence"

Evolution Timeline: What came first vs. last

Conversational abilities
are the easiest to imitate

A lot of our holistic abilities were developed much earlier

Easy-for-AI

Complex calculations

Easy-for-Human

Logical inference (?)

Memorizing information

Language

Coding

Spatial reasoning

Common sense physics
(water flows downhill)

Basic motor skills

Visual reasoning

Understanding context