Towards Foundation Models for Science
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879242/Screenshot_20231008_220411.png)
Siavash Golkar
on behalf of Shirley Ho and the Polymathic AI team
What are foundation models?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10886697/pasted-from-clipboard.png)
Large models pre-trained
on massive datasets
They can perform a variety of downstream tasks (zero-shot generalization)
They are good starting points for fine-tuning on data poor domains (carry useful inductive bias)
The Key Ideas
- Large models pre-trained on task-agnostic objectives on massive & diverse datasets
- They can perform a variety of downstream tasks (zero-shot generalization)
- They are good starting points for fine-tuning on data poor domains (carry useful inductive bias)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10886965/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10886966/pasted-from-clipboard.png)
Can we translate these innovations into a paradigm shift in machine learning for scientific applications?
Polymathic
Advancing Science through Multi‑Disciplinary AI
Our mission: to usher in a new class of machine learning for scientific data, building models that can leverage shared concepts across disciplines."
![](https://polymathic-ai.org/images/avatars/nyu.png)
![](https://polymathic-ai.org/images/avatars/cambridge.png)
![](https://polymathic-ai.org/images/avatars/schmidt.png)
![](https://polymathic-ai.org/images/avatars/berkeley.png)
Meet the Polymathic AI Team
Colm-Cille
Caulfield University of Cambridge
|
Leslie Greengard Flatiron Institute New York University |
David Ha Sakana AI |
Yann LeCun Meta AI New York University |
---|---|---|---|
Stephane Mallat École Normale Supérieure Collège de France Flatiron Institute |
David Spergel Simons Foundation |
Olga Troyanskaya Flatiron Institute Princeton University |
Laure
Zanna New York University
|
Our Resources
SCIENTIFIC ADVISORY GROUP
COMPUTING RESOURCES
- Internal H100 GPUs resources at the Flatiron Institute
- External 500k GPU hours (V100 and A100)
- In the process of securing additional O(10^2) dedicated H100 GPUs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2668792/images/11185806/pasted-from-clipboard.png)
Thanks Ian!
The Foundation Model Spectrum
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
How can we build foundation models that can have these properties?
- How many assumptions should we make about the structure of our inputs?
-
What kind of embedding schemes should we use?
Different choices lead to foundation models with different strengths and weaknesses.
more efficient, less general
more general, less efficient
Language-like/less structured
Structured-data
The Foundation Model Spectrum
Language-like/less structured
Structured-data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879570/xval-splash.jpg)
xVal
A Continuous Number Encoding for LLMs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816057/pasted-from-clipboard.png)
AstroCLIP
Cross-Modal Pretraining for Astronomical data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879587/waves.jpg)
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
more efficient, less general
more general, less efficient
MPP
Multiple Physics Pretraining for
Physical Surrogate Models
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816056/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879587/waves.jpg)
Project led by Michael McCabe, Bruno Régaldo, Liam Parker, Ruben Ohana, Miles Cranmer
Oral presentation at the NeurIPS 2023 AI4Science Workshop
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880624/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880638/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880646/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880650/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880654/pasted-from-clipboard.png)
Context
- Previous works on large domain-specific pretrained models: chemistry, medicine, astrophysics, climate, ...
→ extension on surrogate modeling of spatiotemporal physical systems - Spatiotemporal predictions motivated by faster surrogates for PDE solvers, systems that are hard to simulate with current models/hardware
- Situations where data is expensive…
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880733/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880738/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880742/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880746/pasted-from-clipboard.png)
Time
Ex: N-body simulation
Springel et al. 2005
The Foundation Model Spectrum
Language-like/less structured
Structured-data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879570/xval-splash.jpg)
xVal
A Continuous Number Encoding for LLMs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816057/pasted-from-clipboard.png)
AstroCLIP
Cross-Modal Pretraining for Astronomical data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879587/waves.jpg)
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
Can we build a domain specific foundation model that would be finetunable on a few training examples?
Background
- Main pretraining strategies
- autoregressive prediction
- masked reconstruction
- contrastive learning - No conditioning on physical parameters
- Spatiotemporal physics: PDEs from a physical system with typically conservations and symmetries…
→ Suggests that there are some learnable shared features
Natural choice for physical surrogate modeling
Physical Systems from PDEBench
Navier-Stokes
Incompressible
Compressible
Shallow Water
Diffusion-Reaction
Takamoto et al. 2022
Compositionality and Pretraining
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880419/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880424/pasted-from-clipboard.png)
Balancing objectives during training
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880431/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880435/pasted-from-clipboard.png)
Normalized MSE:
Architecture for MPP
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880428/pasted-from-clipboard.png)
Experiment 1: Performance on Pretraining Tasks
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880439/pasted-from-clipboard.png)
Context size: 16 frames
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880446/pasted-from-clipboard.png)
Experiment 2: Transfer
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880454/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880455/pasted-from-clipboard.png)
Compressible Navier-Stokes
M = 0.1
M = 1.0
Fun Fact: Finetuning with VideoMAE works quite well…
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880461/pasted-from-clipboard.png)
Tube masking
ViT
ViT
ViT
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880472/pasted-from-clipboard.png)
Trained on reconstructing masked pixels on natural videos (SSV2 and K400)
Tong et al. 2022
Experiment 3: Broader Downstream Tasks
Regression Problems on Incompressible Navier-Stokes
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2560867/images/10880488/pasted-from-clipboard.png)
Mixed results
Long time predictions on Navier-Stokes?
Conclusion
- A single pre-trained transformer model matches/outperforms specialized baselines for each in-distribution task.
- Demonstrated transfer capabilities were (both near and far).
- Early days. We are in the process of scaling up to more diverse data.
- Open code and pretrained models: https://github.com/PolymathicAI/multiple_physics_pretraining
xVal
A Continuous Number Encoding for LLMs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816056/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879570/xval-splash.jpg)
Project led by Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti
Accepted contribution at the NeurIPS 2023 AI4Science Workshop
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887176/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887177/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887179/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887180/pasted-from-clipboard.png)
The Foundation Model Spectrum
Language-like/less structured
Structured-data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879570/xval-splash.jpg)
xVal
A Continuous Number Encoding for LLMs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816057/pasted-from-clipboard.png)
AstroCLIP
Cross-Modal Pretraining for Astronomical data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879587/waves.jpg)
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
The problem: existing LLMs are not suitable for reliable zero-shot numerical operations.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887195/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887201/pasted-from-clipboard.png)
arXiv:2305.18654 [cs.CL]
arXiv:2109.03137 [cs.CL]
They make erratic, discontinuous predictions.
Even fine-tuning exhaustively does not grant out-of-distribution generalization abilities.
Can we build an LLM better suited for numerical data analysis?
xVal in a Nutshell: a continuous numerical encoding for language models
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10886959/pasted-from-clipboard.png)
xVal in a Nutshell: a continuous numerical encoding for language models
This encoding strategy has 3 main benefits:
-
Continuity
-
The model is now end-to-end continuous by construction.
(Standard LLMs are discontinuous both at the input and output stage.)
-
-
Interpolation
-
It makes better out-of-distribution predictions than other numerical encodings.
-
-
Efficiency
-
By using just a single token to represent any number, it requires less memory, compute resources, and training time to achieve strong results.
-
xVal in a Nutshell: a continuous numerical encoding for language models
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887210/pasted-from-clipboard.png)
xVal shows improved predictions for out-of-distribution values.
xVal in a Nutshell: a continuous numerical encoding for language models
When evaluated on multi-digit multiplication tasks, xVal performs comparably well, and is less prone to large outliers:
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887258/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887259/pasted-from-clipboard.png)
And when evaluated on compound operations of basic arithmetic, xVal shows the strongest performance:
xVal in a Nutshell: a continuous numerical encoding for language models
Future directions: improving the dynamic range of the embedded values.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2563653/images/10887267/pasted-from-clipboard.png)
Conclusion
- xVal improves numerical analysis performance of transformer models.
- Good fit for numerical-data-rich texts.
- Not great for general purpose LLMs
(limited dynamic range, can't easily handle 13, 666, 1337, ...). - Not suitable for heavy numerical computation.
- Code available: https://github.com/PolymathicAI/xVal
AstroCLIP
Cross-Modal Pre-Training for Astronomical Foundation Models
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816056/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816057/pasted-from-clipboard.png)
Project led by Francois Lanusse, Liam Parker, Siavash Golkar, Miles Cranmer
Accepted contribution at the NeurIPS 2023 AI4Science Workshop
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10887208/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10887209/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10887212/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10887214/pasted-from-clipboard.png)
The Data Diversity Challenge
- Scientific data is extremely multimodal.
- Observations from different instruments are not interchangeable.
- Metadata matters
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879695/42254_2021_353_Fig1_HTML.webp)
Credit: Melchior et al. 2021
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879711/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880226/pasted-from-clipboard.png)
Credit:DESI collaboration/DESI Legacy Imaging Surveys/LBNL/DOE & KPNO/CTIO/NOIRLab/NSF/AURA/unWISE
The Foundation Model Spectrum
Language-like/less structured
Structured-data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879570/xval-splash.jpg)
xVal
A Continuous Number Encoding for LLMs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816057/pasted-from-clipboard.png)
AstroCLIP
Cross-Modal Pretraining for Astronomical data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879587/waves.jpg)
MPP
Multiple Physics Pretraining for Physical Surrogate Models
Scientific Reasoning
Multi-Modality
Generalization to Data-Limited Domains
Towards Large Multi-Modal Observational Models
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816104/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880548/pasted-from-clipboard.png)
Towards Large Multi-Modal Observational Models
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816104/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880548/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880606/pasted-from-clipboard.png)
Bytes Are All You Need (Horton et al. 2023)
Towards Large Multi-Modal Observational Models
Most General
Most Specific
Independent models for every type of observation
Single model capable of processing all types of observations
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816104/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880548/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880606/pasted-from-clipboard.png)
Bytes Are All You Need (Horton et al. 2023)
AstroCLIP
Can we build a baseline multimodal by bringing together specialized models trained on each modality?
Contrastive Learning in Astrophysics
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816081/pasted-from-clipboard.png)
Self-Supervised similarity search for large scientific datasets (Stein et al. 2021)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818084/pasted-from-clipboard.png)
How do we add in other modalities?
(e.g. spectral information?)
Example of Science Application: Identifying Galaxy Tidal Features
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10880641/pasted-from-clipboard.png)
What is CLIP?
Contrastive Language Image Pretraining (CLIP)
(Radford et al. 2021)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816072/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816074/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816085/pasted-from-clipboard.png)
One model, many downstream applications!
Flamingo: a Visual Language Model for Few-Shot Learning (Alayrac et al. 2022)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816075/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816077/pasted-from-clipboard.png)
Hierarchical Text-Conditional Image Generation with CLIP Latents (Ramesh et al. 2022)
The AstroCLIP approach
- We use spectra and multi-band images as our two different views for the same underlying object.
- DESI Legacy Surveys (g,r,z) images, and DESI EDR galaxy spectra.
- Once trained, we can do example retrieval by nearest neighbor search.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818084/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818219/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818762/pasted-from-clipboard.png)
How we do it
We take a two steps approach:
- Build self-supervised model separately for images and spectra
- Images: Start from pre-trained ResNet 50 from Stein et al. (2021)
- Spectra: Pretrain by mask modeling a GPT-2 like transformer on spectra
- Train an embedding module on top of each backbone under InfoNCE loss
- Images: Simple MLP
- Spectra: Cross-attention module
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818978/pasted-from-clipboard.png)
More examples of retrieval
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818933/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818934/pasted-from-clipboard.png)
Image Similarity
Spectral Similarity
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818942/pasted-from-clipboard.png)
Image-Spectral Similarity
Visualizing properties of the embedding space
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818860/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818861/pasted-from-clipboard.png)
UMAP representation of spectra embeddings
Testing Structure of Embedding Space by k-NN Regression
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818872/z_pred_sp_embeddings.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818874/z_pred_im_sp_embeddings.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818877/z_pred_im_embeddings.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818084/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818896/m_pred_sp_embeddings.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818907/m_pred_im_sp_embeddings.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818912/m_pred_im_embeddings.png)
Comparison to image-only SSL (from Stein et al. 2021)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818923/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818924/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818877/z_pred_im_embeddings.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818912/m_pred_im_embeddings.png)
The Information Point of View
- The InfoNCE loss is a lower bound on the Mutual Information between modalities
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10818771/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10816085/pasted-from-clipboard.png)
Shared physical information about galaxies between images and spectra
Thinking about data from a hierarchical Bayesian model point of view
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10887153/pasted-from-clipboard.png)
=> We are building summary statistics for the physical parameters describing an object in a completely data driven way
What comes next!
- We want to build a shared embedding space for all astrophysics observation modalities.
- Next steps would be, embedded data from different instruments, different filters, etc... to build a universal embedding for types of galaxy observations
- Deeper interaction between the modalities beyond alignment.
- Next steps would be, embedded data from different instruments, different filters, etc... to build a universal embedding for types of galaxy observations
Teaser for what comes next
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10887163/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/866922/images/10879242/Screenshot_20231008_220411.png)
We are just getting started!
Thank you for listening!
Grand Challenges for Foundation Models in Science
- Scientific data is massively multi-modal (not finite). Training a specialist model for each modality is not feasible as we grow the scope of our models.
- Transformers have good inductive bias for many tasks (vision, language, etc.)
But they are not necessarily the best models for scientific foundation models.
(Provably inefficient in specific cases.)
- Many fundamental challenges remain (especially in physics).
(Uncertainty estimates, interpretation, concept discovery, etc)
"large data + large transformer" does not solve everything.
Towards Foundation Models for Science
By Siavash Golkar
Towards Foundation Models for Science
Overview talk of the Polymathic AI Initiative
- 134