AION-1: Omnimodal Foundation Model for Astronomical Sciences

L. Parker, F. Lanusse, J, Shen, O. Liu, T. Hehir, L. Sarra, L. Meyer, M. Bowles, S. Wagner-Carena, H. Qu, S. Golkar, A. Bietti, H. Bourfoune, P. Cornette, K. Hirashima, G. Krawezik, R. Ohana, N. Lourie, M. McCabe, R. Morel, P. Mukhopadhyay, M. Pettee, B. Regaldo-Saint Blancard, K. Cho, M. Cranmer, S. Ho

Surveys Astronomy:
A Data Rich and Multimodal Scientific Domain

Credit:DESI collaboration/DESI Legacy Imaging Surveys/LBNL/DOE, KPNO/CTIO/NOIRLab/NSF/AURA/unWISE

Modern astronomy relies on
large surveys of the sky with a
variety of instruments leading to a
wide diversity of data types.

Our objective: build an Omnimodal Foundation Model that bridges data silos to unlocks cross-modal insights.

120 TB of Training Data, Spanning Diverse Science Cases

(Blanco Telescope and Dark Energy Camera.
Credit: Reidar Hahn/Fermi National Accelerator Laboratory)

(Subaru Telescope and Hyper Suprime Cam. Credit: NAOJ)

(Dark Energy Spectroscopic Instrument)

(Sloan Digital Sky Survey. Credit: SDSS)

(Gaia Satellite. Credit: ESA/ATG)

  • Galaxy formation
  • Cosmology
  • Stellar physics
  • Galaxy archaeology
  • ...

 

Standardizing all Modalities Through Tokenization 

  • For each modality class (e.g. image, spectrum) we build dedicated metadata-aware tokenizers
     
  • For AION-1, we integrate 39 different modalities (different instruments, different measurements, etc.)
\mathcal{L} = \parallel \Sigma^{- \frac{1}{2}} \left( x - d_\theta( \lfloor e_\theta(x) \rfloor_{\text{FSQ}} \right) \parallel_2^2

Any-to-Any Modeling with Generative Masked Modeling

  • Training is done by pairing observations of the same objects from different instruments.
     
  • Model is trained by cross-modal generative masked modeling (Mizrahi et al. 2023)
    => Learns the joint and all conditional distributions of provided modalities:  
\forall m,n \quad p(x_m | x_n)

AION-1 Family of Models

Models trained on the Jean Zay 4 Supercomputer (1400 H100 GPUs)

Example of Out-of-the-Box Capabilities

Survey translation

p(\bm{x}_{HSC} | \bm{x}_{DES} )

Spectrum super-resolution

p(\bm{x}_{DESI} | \bm{x}_{GAIA} )

Adaptation to Downstream Scientific Use-Cases

DiNOv2

Example-Based Retrieval of Rare Objects

nDCG@10 score

From a pool of 500,000 candidate lenses...

Please Come See Us at Our Poster for More !

Thank you for watching !

NeurIPS AION-1

By eiffl

NeurIPS AION-1

Short talk for NeurIPS

  • 115