Francois Lanusse
CNRS
astro-ph abstracts mentioning Deep Learning, CNN, or Neural Networks
The vast majority of these results has relied on supervised learning and networks trained from scratch.
=> Limits in practice the ease of using deep learning for analysis and discovery
=> Alright, lets make it happen!
Collaborative project with about 30 contributors
Presented at NeurIPS 2024 in Datasets & Benchmark track
Multiband images from Legacy Survey
hsc
├── hsc.py
├── pdr3_dud_22.5
│ ├── healpix=1104
│ │ └── 001-of-001.hdf5
│ ├── healpix=1105
│ │ └── 001-of-001.hdf5
│ ├── healpix=1106
│ │ └── 001-of-001.hdf5
│ ├── healpix=1107
│ │ └── 001-of-001.hdf5
│ ├── healpix=1171
│ │ └── 001-of-001.hdf5
│ ├── healpix=1172
│ │ └── 001-of-001.hdf5
│ ├── healpix=1174
│ │ └── 001-of-001.hdf5
│ ├── healpix=1175
│ │ └── 001-of-001.hdf5
│ ├── healpix=1702
│ │ └── 001-of-001.hdf5
...
from datasets import load_dataset
# Open Hugging Face dataset
dset_ls = load_dataset("MultimodalUniverse/legacysurvey",
streaming=True,
split='train')
dset_ls = dset_ls.with_format("numpy")
dset_iterator = iter(dset_ls)
# Draw one example from the dataset iterator
example = next(dset_iterator)
# Let's inspect what is contained in an example
print(example.keys())
figure(figsize=(12,5))
for i,b in enumerate(example['image']['band']):
subplot(1,4,i+1)
title(f'{b}')
imshow(example['image']['flux'][i], cmap='gray_r')
axis('off')
dict_keys(['image', 'blobmodel', 'rgb', 'object_mask', 'catalog', 'EBV', 'FLUX_G', 'FLUX_R', 'FLUX_I', 'FLUX_Z', 'FLUX_W1', 'FLUX_W2', 'FLUX_W3', 'FLUX_W4', 'SHAPE_R', 'SHAPE_E1', 'SHAPE_E2', 'object_id'])
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP
Project led by Liam Parker, Francois Lanusse, Leopoldo Sarra, Siavash Golkar, Miles Cranmer
Accepted contribution at the NeurIPS 2023 AI4Science Workshop
Published in the Monthly Notices of Royal Astronomical Society
Contrastive Language Image Pretraining (CLIP)
(Radford et al. 2021)
Cosine similarity search
Supervised baseline
Shared physical information about galaxies between images and spectra
=> We are building summary statistics for the physical parameters describing an object in a completely data driven way
PCA of patch features
Dense Semantic Segmentation
Dense Depth Estimation
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
AION-1
with extensive support from the rest of the team.
Project led by:
Francois
Lanusse
Liam
Parker
Jeff
Shen
Tom
Hehir
Ollie
Liu
Lucas
Meyer
Leopoldo
Sarra
Sebastian Wagner-Carena
Helen
Qu
Micah
Bowles
(Blanco Telescope and Dark Energy Camera.
Credit: Reidar Hahn/Fermi National Accelerator Laboratory)
(Subaru Telescope and Hyper Suprime Cam. Credit: NAOJ)
(Dark Energy Spectroscopic Instrument)
(Sloan Digital Sky Survey. Credit: SDSS)
(Gaia Satellite. Credit: ESA/ATG)
Cuts: extended, full color griz, z < 21
Cuts: extended, full color grizy, z < 21
Cuts: parallax / parallax_error > 10
Field Embedding Strategy Developed for
Multiple Physics Pretraining (McCabe et al. 2023)
DES g
DES r
DES i
DES z
HSC g
HSC r
HSC i
HSC z
HSC y
Models trained as part of the 2024 Jean Zay Grand Challenge
following an extension to a new partition of 1400 H100s
Survey translation
Spectrum super-resolution
Redshift estimation
Conventional scientific workflow with deep learning
Conventional researchers @ CMU
Circa 2016
CMU DeepLens (Lanusse et al 2017)
Foundation Model-based Scientific Workflow
Already taken care of
=> Let's discuss embedding-based adaptation
Galaxy property estimation given the PROVABGS catalog (Hahn et al. 2023) as training data
of regression
from inquiry import Inquiry
from sklearn.neighbors import KNeighborsRegressor
# [Load training and testing data]
# Compute embeddings through the PolymathicAI API
client = Inquiry()
embeddings = client.embeddings(x_train, 'image-astroclip')
# Build regression model
model = KNeighborsRegressor().fit(embedding, y_train)
# Perform Inference
preds = model.predict(x_test)
Adaptation at low cost
with simple strategies:
x_train = Tokenize(hsc_images, modality='HSC')
model = FineTunedModel(base='Aion-B',
adaptation='AttentivePooling')
model.fit(x_train, y_train)
y_test = model.predict(x_test)
Trained on ->
Eval on ->
Segmenting central bar and spiral arms in galaxy images based on Galaxy Zoo 3D
Thank you for listening!