Federica B. Bianco

University of Delaware

Physics and Astronomy 

Biden School of Public Policy and Administration

Data Science Institute

 

Vera C. Rubin Observatory

Deputy Project Scientist - Construction

Interim Head of Science - Operations

  The symbiotic relationship between ML and the Physical sciences

APS MAS 2023

FASTLab

motivation

A two-way street
AI enables discoveries in astronomy;

astronomy will provide high-impact research problems

and large, complex, and open datasets that
will result in new AI breakthroughs

Historical perspective

1/6

Galileo Galilei 1610

Experiment driven

what drives

astronomy

Enistein 1916

Theory driven | Falsifiability

Experiment driven

what drives

astronomy

Ulam 1947

Theory driven | Falsifiability

Experiment driven

Simulations | Probabilistic inference | Computation

http://www-star.st-and.ac.uk/~kw25/teaching/mcrt/MC_history_3.pdf

what drives

astronomy

Theory driven | Falsifiability

Experiment driven

Simulations | Probabilistic inference | Computation

the 1947-today

what drives

astronomy

the 2000s-today

Theory driven | Falsifiability

Experiment driven

Simulations | Probabilistic inference | Computation

Data | Survey astronomy | Computation | pattern discovery

what drives

astronomy

@fedhere

from commissioniong observation

to scanning the sky and giving away the data (open science model!)

~1000 images per night

10M alerts per night (5sigma changes)

17B stars 30B galaxies Ivezic+19

 

 

 

 

 

Astronomy by the numbers

New stress on the infrastructure

Rubin LSST starting in 2025 expects:

~1000 images per night

10M alerts per night (5sigma changes)

 

 

200 quadruply-lensed quasars Minghao+19

~50 kilonovae Setzer+19, Andreoni+19   (+ ToO)

>10 interstellar objects

~10k SuperLuminous Supernovae Villar+ 2018

~ 50k Tidal Disruption Events Brickman+ 2020

~10 million QSO Mary Loli+21

 

 

 

Gartner report 2001



4-V of Big Data

4-V of Big Data

V1: Volume
Number of bites

 

Number of pixels

 

Number of astrophysical objects in a data x number of featured measured


 

 

V2: Variety
Diverse science return from the same dataset

e.g. cosmology+stellar physics
cosmo

Multiwavelength

Multimessenger

Images and spectra

V4: Veracity
This V will refer to both data quality and availability (added in 2012)

 

Inclusion of uncertainty in inference and simulations
 

V3: Velocity

Real time analysis, edge computing, data transfer

Gartner report 2001



Gartner report 2001



Exquisite image quality

all over the sky

over and over again 

SDSS image circa 2000
HSC image circa 2018

when you look at the sky at this resolution and this depth...

everything is blended and everything is changing

Gartner report 2001



Gartner report 2001



@fedhere

Text

@fedhere

log number of Megapixels

1.5          2.0        2.5        3.0        3.5     

Etendue: area x FoV

4-V of Big Data

Exquisite image quality

all over the sky

over and over again 

SDSS image circa 2000
HSC image circa 2018

when you look at the sky at this resolution and this depth...

everything is blended and everything is changing

Gartner report 2001



Gartner report 2001



Text

4-V of Big Data

3.2 Gpix Rubin camera

learning by example

(supervised learning)

pattern discovery

(unsupervised learning)

The DOE LSST camera

at the Vera C. Rubin Observatory

has 3.2 gigapixels

to scan the whole sky

at high resolution

every few nights

 

 

400 4K HD TVs to display a singla LSST Camera Image

Historical perspective

"Data that stresses the infrastructure"

John R. Mashey Chief Scientist, SGI, mid-1990s

Historical perspective

"Data that does not fit in memory"

@fedhere

Historical perspective

Big Data in astronomy papers

(source: ADS)

@fedhere

@fedhere

@fedhere

Historical perspective

- Big Data

- Data Science (x30) (x30)

- Artificial Intelligence (x10)

1996                                              2006                                                    2016

occurrence of term in Google-books corpus https://books.google.com/ngrams

Historical perspective

- Big Data

- Data Science (x30) (x30)

- Artificial Intelligence (x10)

occurrence of term in Google-books corpus https://books.google.com/ngrams

Exquisite image quality

all over the sky

over and over again 

SDSS image circa 2000
HSC image circa 2018

when you look at the sky at this resolution and this depth...

everything is blended and everything is changing

Gartner report 2001



Gartner report 2001



Historical perspective

Text

Astronomical phenomena happen at all time scales and require federated observations to collect sufficient data to unravel the physics

cepheid

Astronomy’s Discovery Chain

Community Brokers

target observation managers

the astronomy discovery chain

Time domain astrophysics

when did the first Neural Network in astronomy review came out?

Smith+Geach May 2022 Astronomia ex machina

number of arXiv:astro-ph submissions with abstracts containing one or more of the strings: ‘machine learning’, ‘ML’, ‘artificial intelligence’, ‘AI’, ‘deep learning’ or ‘neural network’.

Extreme levels of automation

2/6

Discovery Engine

10M alerts/night

Community Brokers

target observation managers

the astronomy discovery chain

federica bianco - fbianco@udel.edu

Pitt-Google

Broker

BABAMUL

The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker

 F. Förster et al 2021 AJ 161 242

from commissioniong observation

to scanning the sky and giving away the data (open science model!)

 Classification and rare event detection

3/6

High Energy Phsyics leading the way

1988

High Energy Phsyics leading the way

Simulation Based Inference

Physical parameters (particles)

Simulate hundreds of millions of particles interactions

Calculate the P(data | physics) in all observational spaces**

Statistical models of measurements performed by independent teams of scientists are combined a posteriori without loss of detail

=> the discovery of Higgs Boson

High Energy Phsyics leading the way

Simulation Based Inference

Physical parameters (particles)

Simulate hundreds of millions of particles interactions

Calculate the P(data | physics) in all observational spaces**

Statistical models of measurements performed by independent teams of scientists are combined a posteriori without loss of detail

=> the discovery of Higgs Boson

now developing NN to model summary statistics from high dimensional feature spaces

High Energy Phsyics leading the way

The rise of Bayesian Deep Learning

Astronomical anomalies and rare classes

Forcing Serendipity

Sparse, unevenly sampled Kepler time series

2D T-SNE projection of feature space

Weirdness score

Astronomical anomalies and rare classes

Forcing Serendipity

Sparse, unevenly sampled Kepler time series

2D T-SNE projection of feature space

Weirdness score

and yet...

discovered by eye

Generative AI

4/6

SN classification from Spectra

Spectra reveal progenitors of stellar explosions

but spectra are expensive to take

 

 

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Text

Classification from sparse data: Lightcurves

If you are trying to simulate the whole Universe.... you are going to be computationally limited

Using AI to speed up simulations

by learning scale relations

 

Transfer Learning

from simulation to real data

 

Physics Informed Models

 (and go full circle back to Galaxy morphology classification with few-shot learning!)

Physics informed AI

5/6

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

-1980's - today

data driven: lots of data, drop theory and use associations, black-box modles

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

-1980's - today

data driven: lots of data, drop theory and use associations, black-box modles

lots of data yet not enough for entirely automated decision making

complex theory that cannot be solved analytically

 

combine it with some theory

PiNN

Non Linear PDEs are hard to solve!

  • Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

 

  • Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

\mathrm{loss} = L2 + PDE =\\ \sum(u_\theta - u)^2 + \\ (\partial_t u_\theta + u_\theta \, \partial_x u_\theta - (0.01/\pi) \, \partial_{xx} u_\theta)^2\\

Thank you!

Federica B. Bianco

University of Delaware

Physics and Astronomy 

Biden School of Public Policy and Administration

Data Science Institute

 

Vera C. Rubin Observatory

Deputy Project Scientist - Construction

Interim Head of Science - Operations

MAS APS 2023 astronomy + AI

By federica bianco

MAS APS 2023 astronomy + AI

  • 234