AAS 247 - Special Session: Advancing AI Infrastructure for Large Astronomy Datasets
François Lanusse
CNRS Researcher @ AIM, CEA Paris-Saclay
Polymathic AI
Conventional scientific workflow with deep learning
Conventional researchers @ CMU
Circa 2016
CMU DeepLens (Lanusse et al 2017)
Foundation Model-based Scientific Workflow
Already taken care of
=> Greatly reduces time to science
=> Alright, lets make it happen!
Credit: Melchior et al. 2021
Credit:DESI collaboration/DESI Legacy Imaging Surveys/LBNL/DOE & KPNO/CTIO/NOIRLab/NSF/AURA/unWISE
Collaborative project with about 30 contributors
Presented at NeurIPS 2024 Datasets & Benchmark track
Ground-based imaging from Legacy Survey
Space-based imaging from JWST
Most General
Most Specific
Single model capable of processing all types of data
Independent models for all types of data
Most General
Most Specific
Independent models for all types of data
Single model capable of processing all types of data
Bytes Are All You Need (Horton et al. 2023)
Most General
Most Specific
Independent models for all types of data
Single model capable of processing all types of data
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP (Parker et al. 2024)
AstroCLIP
Most General
Most Specific
Independent models for all types of data
Single model capable of processing all types of data
Bytes Are All You Need (Horton et al. 2023)
Early Fusion Multimodal Models
AstroCLIP (Parker et al. 2024)
Accepted at NeurIPS 2025, spotlight presentation at NeurIPS 2025 AI4Science Workshop
Project led by:
Francois
Lanusse
Liam
Parker
Jeff
Shen
Tom
Hehir
Ollie
Liu
Lucas
Meyer
Sebastian Wagner-Carena
Helen
Qu
Micah
Bowles
(Blanco Telescope and Dark Energy Camera.
Credit: Reidar Hahn/Fermi National Accelerator Laboratory)
(Subaru Telescope and Hyper Suprime Cam. Credit: NAOJ)
(Dark Energy Spectroscopic Instrument)
(Sloan Digital Sky Survey. Credit: SDSS)
(Gaia Satellite. Credit: ESA/ATG)
Cuts: extended, full color griz, z < 21
Cuts: extended, full color grizy, z < 21
Cuts: parallax / parallax_error > 10
Adaptation at low cost
with simple strategies:
x_train = Tokenize(hsc_images, modality='HSC')
model = FineTunedModel(base='Aion-B',
adaptation='AttentivePooling')
model.fit(x_train, y_train)
y_test = model.predict(x_test)
Inputs:
measured fluxes
Inputs:
measured fluxes + image
Trained on ->
Eval on ->
DiNOv2
Segmenting central bar and spiral arms in galaxy images based on Galaxy Zoo 3D
nDCG@10 score
Thank you for listening!