Francois Lanusse
CNRS / Flatiron Institute
Colm-Cille
Caulfield University of Cambridge
|
Leslie
Greengard Flatiron Institute
New York University |
David Ha Sakana AI |
Yann LeCun Meta AI New York University |
---|---|---|---|
Stephane
Mallat École Normale Supérieure
Collège de France Flatiron Institute |
David
Spergel Simons Foundation |
Olga Troyanskaya Flatiron Institute Princeton University |
Laure
Zanna New York University
|
SCIENTIFIC ADVISORY GROUP
Our mission: to usher in a new class of machine learning for scientific data, building models that can leverage shared concepts across disciplines."
Language-like/less structured
Structured-data
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
How can we build foundation models that jump across scientific disciplines?
Language-like/less structured
Structured-data
AstroCLIP
Cross-Modal Pretraining for Astronomical data
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
xVal
A Continuous Number Encoding for LLMs
Project led by Michael McCabe, Bruno Régaldo, Liam Parker, Ruben Ohana, Miles Cranmer
Accepted at NeurIPS 2024, Best paper award at the NeurIPS 2023 AI4Science Workshop
Navier-Stokes
Incompressible
Compressible
Shallow Water
Diffusion-Reaction
Takamoto et al. 2022
Can we improve performance of surrogate models by pretraining on large quantities of easily simulatable systems?
Normalized MSE:
Context size: 16 frames
Compressible Navier-Stokes
M = 0.1
M = 1.0
PDEBench
Accepted at NeurIPS 2024 Datasets & Benchmark Track
Language-like/less structured
Structured-data
AstroCLIP
Cross-Modal Pretraining for Astronomical data
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
xVal
A Continuous Number Encoding for LLMs
Language-like/less structured
Structured-data
AstroCLIP
Cross-Modal Pretraining for Astronomical data
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
xVal
A Continuous Number Encoding for LLMs
Credit: Melchior et al. 2021
Credit:DESI collaboration/DESI Legacy Imaging Surveys/LBNL/DOE & KPNO/CTIO/NOIRLab/NSF/AURA/unWISE
Collaborative project with about 30 contributors
Accepted at NeurIPS 2024 Datasets & Benchmark track
Credit: Melchior et al. 2021
Multiband images from Legacy Survey
Presented at NeurIPS 2024
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP
Project led by Francois Lanusse, Liam Parker, Leopoldo Sarra, Siavash Golkar, Miles Cranmer
Accepted contribution at the NeurIPS 2023 AI4Science Workshop
Published in Monthly Notices of Royal Astronomical Society
Contrastive Language Image Pretraining (CLIP)
(Radford et al. 2021)
Cosine similarity search
Supervised baseline
We use estimates of galaxy properties from the PROVABGS catalog (Hahn et al. 2023) (Bayesian spectral energy distribution (SED) modeling of DESI spectroscopy and photometry method)
of regression
Negative Log Likelihood of Neural Posterior Inference
Shared physical information about galaxies between images and spectra
=> We are building summary statistics for the physical parameters describing an object in a completely data driven way
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP
Early Fusion Multi-modal Data Models
Flamingo: a Visual Language Model for Few-Shot Learning (Alayrac et al. 2022)
Chameleon: Mixed-Modal Early-Fusion Foundation Models (Chameleon team, 2024)
Galaxy Image Segmentation
Walsmley & Spindler (2023)
Galaxy Image Deblending
=> Foundation Models that build a deep understanding of the data at the pixel level.
Input
Reconstructed
Conditional Generation
Similarity search
Survey translation
Redshift estimation
Thank you for listening!