University of Delaware
Department of Physics and Astronomy
Biden School of Public Policy and Administration
Data Science Institute
Rubin Legacy Survey of Space and Time
Deputy Project Scientist for Construction
LSST Survey Scientist
federica b. bianco
she/her
Grad student
Since 2019 we study the sky (and more!) mostly with AI
Postdoc
LSST:The Vera C. Rubin Observatory Legacy Survey of Space and Time
20Tb of data every night. That is equivalent to |
8,000 high definition movies
4,000 hours of tiktok videos
every night for 10 years
what's in a name?
what's in a name?
The first ground-based national US observatory named after a woman, Dr. Vera C. Rubin
what's in a name?
The first ground-based national US observatory named after a woman, Dr. Vera C. Rubin
Building an unprecedented catalog of Solar System Objects
LSST Science Drivers
Mapping the Milky Way and Local Volume
Probing Dark Energy and Dark Matter
Exploring the Transient Optical Sky
To accomplish this, we need:
Objective: to provide a science-ready dataset to transform the 4 key science area
To accomplish this, we need:
1) Dark skies - Cerro Pachon Chile
Objective: to provide a science-ready dataset to transform the 4 key science area
To accomplish this, we need:
1) Dark skies - Cerro Pachon Chile
2) a large telescope mirror to be sensitive - 8m (6.7m)
Objective: to provide a science-ready dataset to transform the 4 key science area
May 2022 - Telescope Mount Assembly
3.2 Gigapixels:
We built the largest (declassified) camera ever built
to look farther and wider into the sky than ever before
0.2'' / pixel, 6 filgers (ugrizy)
1996-1998 Tony Tyson, Roger Angel
How it started
with Zhoran Mandami,
Astronaut Reid Wiseman,
Activist Zabib Musa Loro,
Pope Leo XIV,
Olympic medalist Alysa Liu,
Benicio Del Toro.......
2008
2017
Are We There YET????!!!!
Eye to the sky…on-sky engineering tests have begun at
Rubin Observatory using the world’s largest digital camera!
June 23 2025
First Look party here at UD with 213 people signed up!
678 separate images taken in just over seven hours of observing time. Trifid nebula (top right) and the Lagoon nebula, which are several thousand light-years away from Earth. | NSF-DOE Vera C. Rubin Observatory
The Vera C. Rubin Observatory Data Preview 1
June 30, 2025
DP1 release!
>25% DIA detections
>25% DIA detections
1 AGN
>25% DIA detections
1 AGN
Galactic variables
>25% DIA detections
new AGNs/TDEs
1 AGN
Galactic variables
10 stars explode in the universe every second
Until the 1900s we would see 1 in a century
Until the 1980s we would see 1 in a decade
Until the 2010s we would see 1 in a month
With the Vera C. Rubin Observatory we will see 1000 every night !
2025
edge computing
more data
but also
information-rich data
SKA
(2025)
edge computing
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200)Villar+ 2018
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
edge computing
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200)
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
edge computing
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200)
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200) Villar+ 2018
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200) Villar+ 2018
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200) Villar+ 2018
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
SKA
(2025)
17B stars (x10) Ivezic+19
~10 million QSO (x10) Mary Loli+21
~50k Tidal Disruption Events (from ~150) Brickman+ 2020
~10k SuperLuminous Supernovae (from ~200) Villar+ 2018
~400 strongly lensed SN Ia (from 10) Ardense+24
~50 kilonovae (from 2) Setzer+19, Andreoni+19 (+ ToO)
> 10 Interstellar Objects fom 2.... ?)
True Novelties!
2025
edge computing
Other surveys will be even larger!
"BUT BIG DATA DOES NOT MEAN BIG SCIENCE"
Yang Huang,
University of Chinese Academy of Sciences
SpecCLIP talk @UNIVERSAI
IAU workshop Greece June 2025
rapid accutare alert release
data right holders only
world public!
10Million alerts per night!
in 60 seconds:
Difference Image Analysis
template
in 60 seconds:
Difference Image Analysis
template
difference image
in 60 seconds:
Difference Image Analysis
template
difference image
INFRASTRUCTURE FOR DM IS CPU BASED
search
template
difference
-
=
96% accurate
Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow
search
template
difference
-
=
92% accurate
Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow
search
template
difference
-
=
Improving the efficiency of transient detections with Neural Networks
search
template
difference
-
=
Saliency maps: what pixels matter?
search
template
difference
95% accurate
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow,
LSST Data Science Fellow
FASTlab Flash highlight
What is the network learning?
What can we learn from the AI?
search
template
difference
template
search
Tatiana Acero-Cuellar, UNIDEL fellow, LSSTC data science fellow
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
DP1
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
DP1
Text
tests on injection in cosmos fiels
alerts
tests on injection in cosmos fiels
alerts
tests on injection in cosmos fiels
alerts
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
DP1
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
DP1
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
DP1
Saliency maps: what pixels matter?
search
template
difference
Acero-Cuellar et al. DESC submitted
Tatiana Acero-Cuellar
UNIDEL fellow
LSST Data Science Fellow
The Rubin LSST ML-Reliability Score (aka real-bogus)
accuracy 98.06%, purity 97.87%, completeness of 98.27%... on simulated data
- requires instantaneous inference
- limited computational resources (CPU)
- evolving data quality
- limited ground truth data (e.g. no variable stars in training)
DP1
POI/Variables
Ampel
Alerce
Ampel
ALERTS HAVE STARTED!
Why not always? We deliver alerts on DDF+Virgo when we run the survey in survey mode (=not engineering tests)
Why not everywhere? Limited templates available - template foorprint increasing
~7M alerts per night
*Conservative ML-reliability scores for now because the infrastructure (Rubin and brokers) is still under development
classifications by ALeRce and Lasair
ALERTS HAVE STARTED!
survey optimization
Discovery Engine
10M alerts/night
Community Brokers
target observation managers
BABAMUL
Operation Simulator (OpSim)
simulates the catalog of LSST observations + observation properties
Metric Analysi Framwork (MAF)
Python API to interact with OpSims specifying science performance on a science case with a metric
Lynne Jones
Peter Yoachim
~100s simulations
~1000s MAFs
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
85% submissins led by SC members
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
Survey Cadence Optimization Committee
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
Survey Cadence Optimization Committee
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
Survey Cadence Optimization Committee
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
Survey Cadence Optimization Committee
2017
80,000
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
2019
80,000
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
2023
80,000
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
2024
80,000
Rubin has involved the community to an unprecedented level in survey design this is a uniquely "democratic" process!
2024
2024
80,000
←Dimmer Brighter →
0.01 0.1 1 10 100
TDA =>
~800 per field
10 seasons, with each 6 months
2 visits per night (within ~30 min for Solar System Science)
revisit time => 4.5 nights
This will scatter significantly (weather, moon, ...)
The original survey plan didn't lead to good time domain astronomy (TDA) outcomes:
2 intranight obs in same filer +
2 intranight obs in another filter ~5 day later
←Dimmer Brighter →
Text
From Law et al. 2021
# pairs of observations (1e5)
time gaps (days)
Eric C. Bellm et al 2022 ApJS 258 13
Proposed 3 intranight obs
2 within 1 hour in different filters
1 at 4-8 hours separation w repeat filter
Intranight color (near instantaneous)
Intranight rate of change (~hour time scales)
Presto-Color, Bianco+ 2019
Current plan: rolling 8 out of the 10 years
newer simulations ->
<-bad good ->
newer simulations ->
4 – 24 hour gaps between epochs will enable kilonova parameter estimation
Andreoni+ 2022a
newer simulations ->
<-bad good ->
Real Observatory Performance??
TDA is most sensitive to efficiency and cannot recover by just adding survey time
https://pstn-056.lsst.io/
Proposed reduction to 6 rolling years (3 2-year cycles) to improve intrasurvey uniformity
https://pstn-056.lsst.io/
8y rolling
no rolling
6y rolling
~7% loss in KN characterization
Proposed reduction to 6 rolling years (3 2-year cycles) to improve intrasurvey uniformity
Text
Transietsand variabel stars SC
TVS Science Collaboration
join TVS! no fees no minimum req
Chairs: Igor Andreoni,
Sara Bonito
Shar Daniels
NSF Graduate Student Fellow
University of Delaware
TVS Science Collaboration
Fast Transient Subgroup
Fast Transients MetricsCOordinations and White Paper
Continuous readout
astronomical images for rapid transients
Shar Daniels
NSF Graduate Fellow
CHECK OUT THE POSTER!
Challange
follow up
Rubin will see ~1000 SN every night!
Credit: Alex Gagliano IAIFI fellow MIT/CfA
KIC 3858884: A hybrid δ Scuti pulsator in a highly eccentric eclipsing binary Maceroni+2014
Kepler EB
LSST (simulated) EB
Kepler EB
LSST (simulated) EB
LSST Deep Drilling Fields
LSST Wide Fast Deep (main survey)
Photometric classification of subtypes may simply not be possible
Spectroscopic classification at scale will not be possible
Rest+Andreoni
Fortino
When they go high, we go low... spectra classification at low resolution
Astrophysical spectra require the capture of enough photons at each wavelength:
large telescopes
long exposure times
bright objects
Willow Fox Fortino
UDelaware
When they go high, we go low
Classification power vs spectral resolution for SNe subtypes
FASTlab Flash highlight
Willow Fox Fortino
UDelaware
When they go high, we go low
Classification power vs spectral resolution for SNe subtypes
Willow Fox Fortino
UDelaware
When they go high, we go low
Classification power vs spectral resolution for SNe subtypes
Adapting Transformer architecture (Vaswani et al. 2017)
Classification from sparse data: Lightcurves
Viswani+ 2017 Attention is all you need
AI was transformed in 2017 by this paper
Willow Fox Fortino
UDelaware
When they go high, we go low
Classification power vs spectral resolution for SNe subtypes
FASTlab Flash highlight
Willow Fox Fortino
UDelaware
When they go high, we go low
Classification power vs spectral resolution for SNe subtypes
FASTlab Flash highlight
Willow Fox Fortino
UDelaware
When they go high, we go low
Classification power vs spectral resolution for SNe subtypes
data embedding
classification head
Willow Fox Fortino
UDelaware
Text
A new AI-based classifier for SN spectra at low resolution
Ally Baldelli
UDelaware
Text
Upgrading ABC-SN
Challange
data encoding
visualizatoin and concept credit: Alex Razim
Kaicheng Zhang et al 2016 ApJ 820 67
SN 2011fe
deSoto+2024
Boone 2017
7% of LSST data
Boone 2017
7% of LSST data
The rest
Lochner et al 2018
Lochner et al 2018
Text
Dr. Somayeh HKhakpash
LSST Catalyst Fellow
Lehigh University Visiting Prof.
Text
we introduce
Gaussian process Optimized Photometric Regression of Extragalactic Archival Ultraviolet-infrared eXplosions, a.k.a GOPREAUX—
a Python package for Gaussian Process Regression of multi-wavelength transient photometry. [...]
This allows for predictions of light curves [...] at higher redshifts, where the rest-frame UV emission is redshifted into the observer-frame optical or infrared.
Gaussian processes work by imposing a kernel that represents the covariance in the data (how data depend on time or time/wavelength). Imposing the same kernel for different time-domain phenomena is principally incorrect
=> bias toward known classes
Methodological issues with these approaches
Neural processes replace the imposed kernel with a learned one
Siddharth Chaini
NASA FINESST Fellow
Siddharth Chaini
NASA FINESST Fellow
Siddharth Chaini
NASA FINESST Fellow
anomaly detection
Challenge
Most classifiers for variable stars use Random Forest (not distance based)
In distance based classification, optimal distances can be found for the class of interest: flexible, customizable, efficient
https://arxiv.org/pdf/2403.12120.pdf
Astronomy and computing
Check out the poster
Siddharth Chaini
NASA FINESST Fellow
Siddarth Chiaini, UDelaware
FASTlab Flash highlight
Siddarth Chiaini, UDelaware
FASTlab Flash highlight
Siddarth Chiaini, UDelaware
FASTlab Flash highlight
Siddarth Chiaini, UDelaware
FASTlab Flash highlight
Siddarth Chiaini, UDelaware
FASTlab Flash highlight
Text
This distance based methods can find true out of set anomalies, not just unusal presentatios of the usual physicis!
And its "explainable"!
This ensamble distance method excells at identifying out of sample anomalies!
NSF award
2219731
NASA FINESST Fellow
Siddharth Chaini
Check out the poster
Challenge
LEO satellites
The LSST
Science Collaborations
A community of practice funded on principles of Equity, Inclusivity, Cooperation
is a word I am borrowing from Margaret Atwood to describe the fact that the future is us.
However loathsome or loving we are, so will we be.
Whereas utopias are the stuff of dream dystopias are the stuff of nightmares, ustopias are what we create together when we are wide awake
US-TOPIA
thank you!
University of Delaware
Department of Physics and Astronomy
Biden School of Public Policy and Administration
Data Science Institute
federica bianco
fbianco@udel.edu
credit: AMNH Dark Universe
AI
ISN'T FREE
ethics of AI
Challange + Opportunity
Knowledge is power
Knowledge is power
With great power comes grteat responsibility
"Sharing is caring"
the butterfly effect
We use astrophyiscs as a neutral and safe sandbox to learn how to develop and apply powerful tool.
Deploying these tools in the real worlds can do harm.
Ethics of AI is essential training that all data scientists shoudl receive.
Why does this AI model whitens Obama face?
Simple answer: the data is biased. The algorithm is fed more images of white people
But really, would the opposite have been acceptable? The bias is in society
Why does this AI model whitens Obama face?
Simple answer: the data is biased. The algorithm is fed more images of white people
But really, would the opposite have been acceptable? The bias is in society
Why does this AI model whitens Obama face?
Simple answer: the data is biased. The algorithm is fed more images of white people
Joy Boulamwini
Challange
echological AI
←Dimmer Brighter →
←Dimmer Brighter →
0.01 0.1 1 10 100
stellar sexplosions
stellar eruptions
stellar variability
trained extensively on large amounts of data to solve generic problems
Foundational AI models
We use the ILSVRC-2012 ImageNet dataset with 1k classes
and 1.3M images, its superset ImageNet-21k with
21k classes and 14M images and JFT with 18k classes and
303M high-resolution images.
Typically, we pre-train ViT on large datasets, and fine-tune to (smaller) downstream tasks. For
this, we remove the pre-trained prediction head and attach a zero-initialized D × K feedforward
layer, where K is the number of downstream classe
Limited Field of View: Space telescopes often have smaller fields of view compared to ground-based surveys.
Data Latency: Delays in data transmission and processing can affect rapid follow-up.
Resource Allocation: Competition for telescope time can limit observations of certain transients.... LETS NOT TRIGGER 3 ToOs ON THE SAME TRANSIENT!!
(RacusinRacusin et al., 2008et al., 2008
(RacusinRacusin et al., 2008et al., 2008
GRB 080319B, the brightest optical burst ever observed
SWIFT
rapid response
SWIFT
HST, Chandra, SPITZER
...
Kepler, K2, TESS
high precision dense time series
Kepler satellite EB
LSST (simulated) EB
is transient data AI ready?
is transient data AI ready?
is transient data AI ready?
is transient data AI ready?
is transient data AI ready?
is transient data AI ready?
is transient data AI ready?
The PLAsTiCC challenge winner, Kyle Boone was a grad student at Berkeley, and did not sue a Neural Network!
Hlozek et al, 2020
DATA CURATION IS THE BOTTLE NECK
models contributed by the community were in
- different format (spectra, lightcurves, theoretical, data-driven)
- the people that contributed the models were included in 1 paper at best
- incompleteness
- systematics
- imbalance
khakpash+ 2024 showed that the models were biased for SN Ibc
AVOCADO, SCONE, all these models are trained on a biased dataset and are being currently used for classification
Ibc data-driven templates vs PLAsTiCC
Dr. Somayeh Khakpash
LSSTC Catalyst Fellow, Rutgers
Visiting Faculty, Lehigh