University of Delaware

Department of Physics and Astronomy

Biden School of Public Policy and Administration

Data  Science Institute

 

 

Rubin Legacy Survey of Space and Time

Deputy Project Scientist, Rubin Construction

Interim Head of Science, Rubin Operations

Applications of and opportunities for AI in the new era of time-domain astronomy

federica b. bianco

she/her

this slide deck is live at https://slides.com/federicabianco/AMLAW

 

The best way to view the slides is on the web (to see videos and animations). A flat (PDF) version of this deck would be largely diminished

Applications of and opportunities for AI in the new era of time-domain astronomy

University of Delaware

Department of Physics and Astronomy

Biden School of Public Policy and Administration

Data  Science Institute

 

 

Rubin Legacy Survey of Space and Time

Deputy Project Scientist, Rubin Construction

Interim Head of Science, Rubin Operations

federica b. bianco

she/her

explosions in the sky

how we study SNe

 

HELL YEAH!

 

2025

edge computing

Do we want more data??

 

SKA

(2025)

 

 

17B stars Ivezic+19

 

 

 

 

 

 

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

 

 

 

 

 

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

 

 

 

 

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

~10k SuperLuminous Supernovae Villar+ 2018

 

 

 

 

 

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

~10k SuperLuminous Supernovae Villar+ 2018

~200 quadruply-lensed quasars Minghao+19 Ardense+24

 

 

 

 

 

 

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

~10k SuperLuminous Supernovae Villar+ 2018

~200 quadruply-lensed quasars Minghao+19 Ardense+24

~50 kilonovae Setzer+19, Andreoni+19   (+ ToO)

 

 

 

 

 

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

~10k SuperLuminous Supernovae Villar+ 2018

~200 quadruply-lensed quasars Minghao+19 Ardense+24

~50 kilonovae Setzer+19, Andreoni+19   (+ ToO)

> 10 Interstellar Objects (.    ?)

 

 

 

 

 

Rubin LSST Transients by the numbers

edge computing

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

~10k SuperLuminous Supernovae Villar+ 2018

~200 quadruply-lensed quasars Minghao+19, Ardense+24

~50 kilonovae Setzer+19, Andreoni+19   (+ ToO)

> 10 Interstellar Objects (.    ?)

 

 

 

 

 

True Novelties!

edge computing

Rubin LSST Transients by the numbers

SKA

(2025)

 

 

17B stars Ivezic+19

~10 million QSO Mary Loli+21

~50k Tidal Disruption Events Brickman+ 2020

~10k SuperLuminous Supernovae Villar+ 2018

~200 quadruply-lensed quasars Minghao+19, Ardense+24

~50 kilonovae Setzer+19, Andreoni+19   (+ ToO)

> 10 Interstellar Objects (.    ?)

 

 

 

 

 

Rubin Transients by the numbers

True Novelties!

edge computing

 

well... it depends

2025

(2026)

edge computing

Is the data gonna also be better?

thank you Yogesh!

thank you Arumina!

visualizatoin and concept credit: Alex Razim

Kaicheng Zhang et al  2016 ApJ 820 67

https://plasticc.org/data-release

deSoto+2024

Boone 2017

7% of LSST data

Boone 2017

7% of LSST data

The rest

Data: PLAsTiCC

Model: salt2 (Guy+07) implemented with SNCOSMO (Barbary+2012)

lightcurves make really bad tensors

  • Variable sizes of data vectors
  • Uneven sampling
  • Different sampling at different wavelengths
  • Phase gaps can be months long over ~1 year relevant time scales
  • Multiple relevant time scales (Long Sort Medium.... need all memory)
  • Heteroscedastic errors

transient data AI ready (see Alex's talk)

Rohan Pattnaik+ 2025

Dhanpal+2022

thank you Shravan

thank you Rohan

Willimamson+23

Barna+ 2017

Howell 2011

transients spectra are just painful

  • feature broadening -> blending
  • time evolving
  • redshift moves features in different regions of a detector (systematics)
  • in the LSST era we work at the limit of SNR
  • we do not nearly have enough glass in the sky to collect spectra!

time-domain spectra are just painful

\neq

federica bianco - fbianco@udel.edu

Rubin will see ~1000 SN every night!

Credit: Alex Gagliano  IAIFI fellow MIT/CfA

Rubin Observatory

Site: Cerro Pachon, Chile

Funding: US NSF + DOE

 

 

 

 

 

To be transformational simultaneously in the four scientific areas Rubin needs:

 

1) a large telescope mirror to be sensitive - 8m (6.7m) deep survey

 

2) a large field-of-view for sky-scanning speed - 10 deg2 wide survey

 

 

3) high spatial resolution, high quality images - 0.2''/pixels exquisite image quality

 

4) process images in realtime and offline to produce live alerts and catalogs of all 37B objects 

massive time domain dataset

 

Rubin Observatory Status

@fedhere

September 2016

5 / 2019

May 2022 - Telescope Mount Assembly

 

12/2022 TMA in action

weight 2e5 kg, max slew rate 0.2 rad/s

Most of the weight in a 10m disk
Angular momentum

5,000,000 ~\mathrm{kg~m^2 s^{-1}}

The DOE LSST Camera - 3.2 Gigapixel

3024 science raft amplifier channels

Camera and Cryostat integration completed at SLAC in May 2022,

Shutter and filter auto-changer integrated into camera body

LSSTCam undergoing final stages of testing at SLAC

July 2024 ComCam installed on  the telescope after M1M2 installation - Comcam is a 144Mpix version of LSSTCam

artist (me) impression of the first image taken by ComCam

Is the data gonna also be better?

magnitude limit single image r~24

magnitude limit 10 year stacks r~27

spatial resolution 0.2'' (seeing limited)

photometric precision 5mmag

photometric accuracy 10mmag

 

cadence.... that's a long story

federica bianco - fbianco@udel.edu

@fedhere

At this level of precision,everything is variable, everything is blended, everything is moving.

SDSS

LSST-like HSC composite

Field of View'
Image resolution'

DDFs'
Standard visit'
Photometric precision'
Photometric accuracy'
Astrometric precision'
Astrometric accuracy'
9.6 sq deg
0.2'' (seeing limited)

5 DDF
30 sec
5 mmag
10 mmag
10 mas
50 mas

' requirement: ls.st/srd

*simulation pstn-054.lsst.io

SDSS 2x4 arcmin sq griz

MYSUC (Gawiser 2014) 1 mag shallower than LSST coadds

federica bianco - fbianco@udel.edu

u,g,r,i,z,y
Photometric filters'
saturation limit'
# visits*
mag single image*
mag coadd*
Nominal cadence
​u, g, r, i, z, y
~15, 16, 16, 16, 15, 14
53, 70, 185, 192, 168, 165
23.34, 23.2, 24.05, 23.55 22.03
25.4, 26.9, 27.0, 26.5, 25.8, 24.9
2-3 visits per night

At this level of precision,everything is variable, everything is blended, everything is moving.

' requirement: ls.st/srd

*simulation pstn-054.lsst.io

Discovery

The lifecycle of a time-domain project is complex and fertile with opportunities for AI solutions

Discovery Engine

10M alerts/night

Community Brokers

target observation managers

the astronomy discovery chain

Pitt-Google

Broker

BABAMUL

Graphic credit: Francisco Förster Burón

Augmentation

and

distribution

Discovery

The lifecycle of a time-domain project is complex and fertile with opportunities for AI solutions

Graphic credit: Francisco Förster Burón

Augmentation

and

distribution

Discovery

The lifecycle of a time-domain project is complex and fertile with opportunities for AI solutions

Graphic credit: Francisco Förster Burón

Augmentation

and

distribution

Discovery

The lifecycle of a time-domain project is complex and fertile with opportunities for AI solutions

Graphic credit: Francisco Förster Burón

Augmentation

and

distribution

Discovery

The lifecycle of a time-domain project is complex and fertile with opportunities for AI solutions

Discovery

Distribution

Classification 

Data Integration and Follow up

Ensamble Inference

Prediction

Discovery of Novelties

(A.K.A science!)

Discovery

Distribution

Classification 

Data Integration and Follow up

Ensamble Inference

Prediction

Discovery of Novelties

(A.K.A science!)

in <60 seconds:

Difference Image Analysis

in <60 seconds:

Difference Image Analysis

Can we replace DIA with ANN?

TANSINET: Sedhagat + Mahabal 2017

in 60 seconds:

Difference Image Analysis + Bogus rejection

feature extraction + Random Forest

AUTOSCAN: Goldstein+ 2017

search

template

difference

-

=

96% accurate

Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow

search

template

difference

-

=

92% accurate

Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow

Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow

WORKING WITH RUBIN AP TEAM TO DEVELOP THE ML-RELIABILITY SCORE OF RUBIN ALERTS

What is the network learning?

What can we learn from the AI?

search

template

difference

template

search

Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow

What is the network learning?

What can we learn from the AI?

Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow

Interpretable AI

Robust AI

Anomaly detection

Distribution

Classification

Data Integration and Follow up

Ensamble Inference

Prediction

Discovery of Novelties

(A.K.A science!)

Discovery

The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker

 F. Förster et al 2021 AJ 161 242

AI tasks

Distribution

Classification 

Data Integration and Follow up

Ensamble Inference

Prediction

Discovery of Novelties

(A.K.A science!)

Discovery

Photometric Classification of transients

Photometric Classification of transients

Kepler satellite EB

LSST (simulated) EB

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Classification from sparse data: Lightcurves

The PLAsTiCC challenge winnre, Kyle Boone was a grad student at Berkeley, and did not sue a Neural Network!

 

He won $2,000

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Text

Classification from sparse data: Lightcurves

Classification from sparse data: Lightcurves

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Text

Classification from sparse data: Lightcurves

without redshift

with redshift

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Classification from sparse data: Lightcurves

without redshift

with redshift

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Classification from sparse data: Lightcurves

without redshift

with redshift

Kaggle PLAsTiCC challenge

AVOCADO classifier

https://arxiv.org/abs/1907.04690

Classification from sparse data: Lightcurves

Methodological issues with these approaches

CNNs are not designed to ingest uncertainties. Passing them as an image layer "works" but it is not clear why since the convolution on the flux and error space are averaged after the first layer

 

 

 

Methodological issues with these approaches

Gaussian processes work by imposing a kernel that represents the covariance in the data (how data depend on time or time/wavelength). Imposing the same kernel for different time-domain phenomena is principally incorrect

 

=> bias toward known classes!

 

 

Methodological issues with these approaches

Gaussian processes work by imposing a kernel that represents the covariance in the data (how data depend on time or time/wavelength). Imposing the same kernel for different time-domain phenomena is principally incorrect

 

=> bias toward known classes!

 

 

Neural processes replace the imposed kernel with a learned one - ask Siddharth Chaini!

Dr. Somayeh Khakpash

LSSTC Catalyst Fellow, Rutgers

Rare classes will become common, but how do we know what we are looking at and classify different objects for sample studies?

Data-Driven Photometric Templates for stripped SESN

 

on the job market!

Khakpash et al. 2024 ApJS https://arxiv.org/pdf/2405.01672

FASTlab Flash highlight

Methodological issues with these approaches

Attetion requires positional encoding

 

we badly need better benchmark datasets

Hlozek et al, 2020

DATA CURATION IS THE BOTTLE NECK

models contributed by the community were in

- different format (spectra, lightcurves, theoretical, data-driven)

- the people that contributed the models were included in 1 paper at best

- incompleteness

- systematics

- imbalance

 

 

 

khakpash+ 2024 showed that the models were biased for SN Ibc

AVOCADO, SCONE, all these models are trained on a biased dataset and are being currently used for classification

 

Ibc data-driven templates vs PLAsTiCC

 

khakpash+ 2024 showed that the models were biased for SN Ibc

AVOCADO, SCONE, all these models are trained on a biased dataset and are being currently used for classification

 

Ibc data-driven templates vs PLAsTiCC

Ic templates vs ELAsTiCC

 

khakpash+ 2024 showed that the models were biased for SN Ibc

AVOCADO, SCONE, all these models are trained on a biased dataset and are being currently used for classification

 

Ibc data-driven templates vs PLAsTiCC

survey optimization

multimodal data analysis

and pixel to science

why not images too? 

LSST

data products

federica bianco - fbianco@udel.edu

Time

Domain

Science

 

Static

Science

Alerts based

 

Catalog based

Deep stack

based

Deep stack

based

Rubin Observatory LSST 

federica bianco - fbianco@udel.edu

Data Products

federica bianco - fbianco@udel.edu

data right holders only

federica bianco - fbianco@udel.edu

Rubin In-Kind Contribution Program

https://www.lsst.org/scientists/in-kind-program

federica bianco - fbianco@udel.edu

world public!

10Million alerts per night!

LSST survey strategy optimization

Exploring the Transient and Variable Optical Sky

Exploring the Transient and Variable Optical Sky

Exploring the Transient and Variable Optical Sky

Exploring the Transient and Variable Optical Sky

Exploring the Transient and Variable Optical Sky

Exploring the Transient and Variable Optical Sky

LSST Science Book (2009)

Rubin LSST survey design

Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!

2024

2024

ethics of AI in astro

the butterfly effect

NGC 4565 is an edge-on spiral galaxy about 30 to 50 million light-years away. The faculty at the IUCAA used a AI model (emulator) to predict the hidden physical parameters of the Galaxy wrongfully estimating the DM content of NCG 4565 and claimed a novel process for Galaxy formation should be taken under consideration.

 

the butterfly effect

NGC 4565 is an edge-on spiral galaxy about 30 to 50 million light-years away. The faculty at the IUCAA used a AI model (emulator) to predict the hidden physical parameters of the Galaxy wrongfully estimating the DM content of NCG 4565 and claimed a novel process for Galaxy formation should be taken under consideration.

Unfortunately, this was the result of a model hallucination.

the butterfly effect

NGC 4565 is an edge-on spiral galaxy about 30 to 50 million light-years away. The faculty at the IUCAA used a AI model (emulator) to predict the hidden physical parameters of the Galaxy wrongfully estimating the DM content of NCG 4565 and claimed a novel process for Galaxy formation should be taken under consideration.

Unfortunately, this was the result of a model hallucination.

The galaxy was featured in many social media posts gaining rapid notoriety, but upon retraction it was canceled. The galaxy is suing IUCAA claiming emotional damage and loss of revenue

the butterfly effect

the butterfly effect

We use astrophyiscs as a neutral and safe sandbox to learn how to develop and apply powerful tool. 

Deploying these tools in the real worlds can do harm.

Ethics of AI is essential training that all data scientists shoudl receive.

Why does this AI model whitens Obama face?

Simple answer: the data is biased. The algorithm is fed more images of white people

But really, would the opposite have been acceptable? The bias is in society

models are neutral, the bias is in the data (or is it?)

Why does this AI model whitens Obama face?

Simple answer: the data is biased. The algorithm is fed more images of white people

But really, would the opposite have been acceptable? The bias is in society

models are neutral, the bias is in the data (or is it?)

models are neutral, the bias is in the data (or is it?)

Why does this AI model whitens Obama face?

Simple answer: the data is biased. The algorithm is fed more images of white people

Joy Boulamwini

models are neutral, the bias is in the data (or is it?)

thank you!

 

University of Delaware

Department of Physics and Astronomy

 

Biden School of Public Policy and Administration

Data  Science Institute

federica bianco

Rubin Construction

Deputy Project Scientist

fbianco@udel.edu

AIMLAW

By federica bianco

AIMLAW

Fast Transients Opportunities with Rubin LSST

  • 43