2024年粒子天体物理重点实验室系列学术报告|8月19日上午10:00 @高能所

Enhancing Gravitational Wave Astronomy with Artificial Intelligence

He Wang (王赫)

hewang@ucas.ac.cn

International Centre for Theoretical Physics Asia-Pacific (ICTP-AP), UCAS

Taiji Laboratory for Gravitational Wave Universe (Beijing/Hangzhou), UCAS

On behalf of the LIGO-VIRGO-KAGRA collaborations

Enhancing Gravitational Wave Astronomy with Artificial Intelligence

He Wang (王赫)

hewang@ucas.ac.cn

International Centre for Theoretical Physics Asia-Pacific (ICTP-AP), UCAS

Taiji Laboratory for Gravitational Wave Universe (Beijing/Hangzhou), UCAS

On behalf of the LIGO-VIRGO-KAGRA collaborations

Content

GW Astronomy
AI for Science · GW Data Analysis
GW search · Pipeline
Parameter estimation · Scientific discovery
Key Takeaways
(Space-based GW Detection)

In 1916, A. Einstein proposed the GR and predicted the existence of GW.
Gravitational waves (GW) are a strong field effect in the GR.
- 2015: the first experimental detection of GW from the merger of two black holes was achieved.
- 2017: the first multi-messenger detection of a BNS signal was achieved, marking the beginning of multi-messenger astronomy.
- 2017: the Nobel Prize in Physics was awarded for the detection of GW.
- As of now: more than 90 gravitational wave events have been discovered.
- O4, which began on May 24th 2023, is currently in progress.

Gravitational waves generated by binary black holes system

GW detector

LIGO-VIRGO-KAGRA network

2017 Nobel Prize in Physics

Gravitational Wave Astronomy

Technical Challenges: Data Processing for GW

GW Data characteristics

Noise: non-Gaussian and non-stationary
Signal:
- (Earth-based) A low signal-to-noise ratio (SNR) which is typically about 1/100 of the noise amplitude (-60 dB).
- (Space-based) A superposition of all GW signals (e.g.: $10^4$ of GBs, $10\sim10^2$ of SMBHs, and $10\sim10^3$ of EMRIs, etc.) received during the mission's observational run.

Matched filtering techniques (匹配滤波方法)

In Gaussian and stationary noise environments, the optimal linear algorithm for extracting weak signals
Works by correlating a known signal model $h(t)$ (template) with the data.
Starting with data: $d(t) = h(t) + n(t)$ .
Defining the matched-filtering SNR $\rho(t)$ :
$\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2$ , where $\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df$ ,
$\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df$ , $S_n(f)$ is noise power spectral density (one-sided).

Text

LIGO-VIRGO-KAGRA

LISA / Taiji project

Text

Frequentist hypothesis testing and likelihood princple:
- make some assumptions about signal and noise hypothesis
- write down the likelihood function for a signal in noise
- find the parameters that maximise it
- define a corresponding detection statistic
  $\rightarrow$ recover the MF
Bayesian hypothesis testing:
- start from the same likelihood
- define a prior over signal parameters
- marginalise over them to arrive at a Bayes factor
- Often the dirty secret: just treat this as a Frequentist detection statistic
  $\rightarrow$ recover the MF (for certain prior choices)

Text

Pioneering works utilizing CNN

The most common and direct approach, from Computer Vision (CV) to GW signal processing: pixel point $\Rightarrow$ sampling point.

Convolutional neural networks (CNN) can achieve comparable performance to Matched Filtering and surpass them in terms of execution speed (with GPU support) under Gaussian stationary noise.

AI for Science $\rightarrow$ AI for GW Astronomy

Artificial Intelligence (AI) has great potential to revolutionize gravitational wave astronomy by improving data analysis, modeling, and detector development.
Representation and supervised learning crucially extract features from GW signals, autonomously identifying informative features and leveraging labeled data for accuracy.

Text

Exported: Oct, 2023 (in preparation)

PRL, 2018, 120(14): 141103.

PRD, 2018, 97(4): 044039.

引力波数据处理：人工智能技术应用

GW search · Pipeline

Text

Beyond Speed: Generalization and Discovery in GW Detection

Our primary goal is not speed but the model's ability to generalize and discover new GW signals, including those beyond the reach of matched filtering techniques and General Relativity (GR).
Leveraging our experience in signal modeling (MFCNN) and noise modeling (WaveFormer), we are gradually building an offline pipeline capable of searching for signals in complete GW observation data and calculating FARs.

He Wang, et al. PRD 101, 10 (2020): 104003

Transform matched-filtering method from frequency domain to time domain.
The square of matched-filtering SNR for a given data $d(t) = n(t)+h(t)$ :

Frequency domain

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

GW search · Pipeline

Text

Beyond Speed: Generalization and Discovery in GW Detection

Our primary goal is not speed but the model's ability to generalize and discover new GW signals, including those beyond the reach of matched filtering techniques and General Relativity (GR).
Leveraging our experience in signal modeling (MFCNN) and noise modeling (WaveFormer), we are gradually building an offline pipeline capable of searching for signals in complete GW observation data and calculating FARs.

He Wang, et al. PRD 101, 10 (2020): 104003

Transform matched-filtering method from frequency domain to time domain.
The square of matched-filtering SNR for a given data $d(t) = n(t)+h(t)$ :

Frequency domain

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

Time domain

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

$S_n(|f|)$ is the one-sided average PSD of $d(t)$

(whitening)

where

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

GW search · Pipeline

Text

Beyond Speed: Generalization and Discovery in GW Detection

Our primary goal is not speed but the model's ability to generalize and discover new GW signals, including those beyond the reach of matched filtering techniques and General Relativity (GR).
Leveraging our experience in signal modeling (MFCNN) and noise modeling (WaveFormer), we are gradually building an offline pipeline capable of searching for signals in complete GW observation data and calculating FARs.

He Wang, et al. PRD 101, 10 (2020): 104003

Transform matched-filtering method from frequency domain to time domain.
The square of matched-filtering SNR for a given data $d(t) = n(t)+h(t)$ :

Frequency domain

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

Time domain

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

$S_n(|f|)$ is the one-sided average PSD of $d(t)$

(whitening)

where

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Deep Learning Framework

In the 1-D convolution ( $*$ ) on Apache MXNet, given input data with shape [batch size, channel, length] :

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]

FYI: $N_\ast = \lfloor(N-K+2P)/S\rfloor+1$

（A schematic illustration for a unit of convolution layer)

GW search · Pipeline

Text

Beyond Speed: Generalization and Discovery in GW Detection

Our primary goal is not speed but the model's ability to generalize and discover new GW signals, including those beyond the reach of matched filtering techniques and General Relativity (GR).
Leveraging our experience in signal modeling (MFCNN) and noise modeling (WaveFormer), we are gradually building an offline pipeline capable of searching for signals in complete GW observation data and calculating FARs.

He Wang, et al. PRD 101, 10 (2020): 104003

Transform matched-filtering method from frequency domain to time domain.
The square of matched-filtering SNR for a given data $d(t) = n(t)+h(t)$ :

Frequency domain

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

Time domain

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

$S_n(|f|)$ is the one-sided average PSD of $d(t)$

(whitening)

where

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Deep Learning Framework

GW search · Pipeline

Text

Beyond Speed: Generalization and Discovery in GW Detection

Our primary goal is not speed but the model's ability to generalize and discover new GW signals, including those beyond the reach of matched filtering techniques and General Relativity (GR).
Leveraging our experience in signal modeling (MFCNN) and noise modeling (WaveFormer), we are gradually building an offline pipeline capable of searching for signals in complete GW observation data and calculating FARs.

He Wang, et al. PRD 101, 10 (2020): 104003

Time domain

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

$S_n(|f|)$ is the one-sided average PSD of $d(t)$

(whitening)

where

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Deep Learning Framework

modulo-N circular convolution

Data Preprocessing and Training Strategy

\frac{d-mean}{std} = \frac{h}{std}+\frac{n-mean}{std}

\frac{d-mean}{std} = \frac{h}{std}+\frac{n-mean}{std}

Strain

Whiten

Normalized

∼ $10^{−19}$

∼ $10^{2}$

∼ $10^{0}$

32 s

merger

$t_c$ (around GW150914)

\oplus

\oplus

(Cal network SNR)

Band-pass: [20, 2048] Hz

Patching (tokenized) with size 0.125 s and overlap 50%

[1, 128, 256]

(Standard normalization)

dynamic masking

[1, 16512]

[1, 128, 256]

(PSD $_i$ from noise)

Band-pass: [20, 2048] Hz

WaveFormer

MSE-Loss $_i$

$std$

[1, 128, 256]

Noise $_i$ :

Signal $_i$ :

Input $_i$ :

Label $_i$ :

Output $_i$ :

8.0625 s

Given $d = h + n$ , we can normalize $d$ as follows:

Implementations:
- PSD sampling from real noise.
- input size: 8.0625 sec
- fs = 2048Hz
- Band-pass: 20~2048Hz
- Masked loss

He Wang et al 2024 Mach. Learn.: Sci. Technol. 5 015046

Search Strategy Overview

Firstly, we obtain the denoised output by utilizing Waveformer.
Then, triggers are defined and identified by three steps including:
1. Find Peaks. Locate triggers on a single detector by finding its maximum all local-maximum (0.2s away from neighboring maximum/local-maximum).
2. By constraining triggers that exist on both two detectors, we get VALID triggers. (consist 3~4 segments)
3. Calculate the cross-correlation of the to-be-evaluated trigger across channels or within a single channel, between its noisy and corresponding denoised segments, as well as between denoised segments themselves.

noisy input segments

denoised output segments

$\bar{H}$

$\bar{L}$

${H}$

${L}$

\rho_\text{ranking}

\rho_\text{ranking}

AI

He Wang et al 2024 Mach. Learn.: Sci. Technol. 5 015046

AI for Gravitational Wave: Parameter Estimation

A complete 15-dimensional posterior probability distribution, taking about 1 s (<< $10^4$ s).

Prior Sampling: 50,000 Posterior samples in approximately 8 Seconds.

Capable of calculating evidence
Processing time: (using 64 CPU cores)
- less than 1 hour with IMRPhenomXPHM,
- approximately 10 hours with SEOBNRv4PHM

PRL 127, 24 (2021) 241103.

PRL 130, 17 (2023) 171403.

Nature Physics 18, 1 (2022) 112–17

Big Data Mining and Analytics 5, 1 (2021) 53–63.

A diagram of prior sampling between feature space and physical parameter space

（Based on 1912.02762）

【【机器学习】白板推导系列(三十三) ～流模型(Flow based Model)】

Normalizing Flow Model (1/4)

The main idea of flow-based modeling is to express $\mathbf{y}\in\mathbb{R}^D$ as a transformation $T$ of a real vector $\mathbf{z}\in\mathbb{R}^D$ sampled from $p_{\mathrm{z}}(\mathbf{z})$ :

\mathbf{y}=T(\mathbf{z}) \quad \text { where } \quad \mathbf{z} \sim p_{\mathrm{y}}(\mathbf{z})

\mathbf{y}=T(\mathbf{z}) \quad \text { where } \quad \mathbf{z} \sim p_{\mathrm{y}}(\mathbf{z})

Note: The invertible and differentiable transformation $T$ and the base distribution $p_{\mathrm{z}}(\mathbf{z})$ can have parameters $\{\boldsymbol{\phi}, \boldsymbol{\psi}\}$ of their own, i.e. $T_{\phi}$ and $p_{\mathrm{z},\boldsymbol{\psi}}(\mathbf{z})$ .

Change of Variables:

p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}(\mathbf{z})\left|\operatorname{det} J_{T}(\mathbf{z})\right|^{-1} \quad \text { where } \quad \mathbf{u}=T^{-1}(\mathbf{x}) .

p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}(\mathbf{z})\left|\operatorname{det} J_{T}(\mathbf{z})\right|^{-1} \quad \text { where } \quad \mathbf{u}=T^{-1}(\mathbf{x}) .

J_{T}(\mathbf{z})=\left[\begin{array}{ccc} \frac{\partial T_{1}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{1}}{\partial \mathrm{z}_{D}} \\ \vdots & \ddots & \vdots \\ \frac{\partial T_{D}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{D}}{\partial \mathrm{z}_{D}} \end{array}\right]

J_{T}(\mathbf{z})=\left[\begin{array}{ccc} \frac{\partial T_{1}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{1}}{\partial \mathrm{z}_{D}} \\ \vdots & \ddots & \vdots \\ \frac{\partial T_{D}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{D}}{\partial \mathrm{z}_{D}} \end{array}\right]

Equivalently,

The Jacobia $J_{T}(\mathbf{u})$ is the $D \times D$ matrix of all partial derivatives of $T$ given by:

p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}\left(T^{-1}(\mathbf{y})\right)\left|\operatorname{det} J_{T^{-1}}(\mathbf{y})\right|

p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}\left(T^{-1}(\mathbf{y})\right)\left|\operatorname{det} J_{T^{-1}}(\mathbf{y})\right|

p_{\mathrm{y}}(\mathbf{y})

p_{\mathrm{y}}(\mathbf{y})

p_{\mathrm{z}}(\mathbf{z})

p_{\mathrm{z}}(\mathbf{z})

\mathbf{z}

\mathbf{z}

\mathbf{y}

\mathbf{y}

T

T

T^{-1}

T^{-1}

base density

target density

（Based on 1912.02762）

Normalizing Flow Model (2/4)

Data: target data $\mathbf{y}\in\mathbb{R}^{15}$ (with condition data $\mathbf{x}$ ).
Task:
- Fitting a flow-based model $p_{\mathrm{y}}(\mathbf{y} ; \boldsymbol{\theta})$ to a target distribution $p_{\mathrm{y}}^{*}(\mathbf{y})$
- by minimizing KL divergence with respect to the model’s parameters $\boldsymbol{\theta}=\{\boldsymbol{\phi}, \boldsymbol{\psi}\}$ ,
- where $\boldsymbol{\phi}$ are the parameters of $T$ and $\boldsymbol{\psi}$ are the parameters of $p_{\mathrm{z}}(\mathbf{z})=\mathcal{N}(0,\mathbb{I})$ .
Loss function:
Assuming we have a set of samples $\left\{\mathbf{y}_{n}\right\}_{n=1}^{N}\sim p_{\mathrm{y}}^{*}(\mathbf{y})$ ,

Minimizing the above Monte Carlo approximation of the KL divergence is equivalent to fitting the flow-based model to the samples $\left\{\mathbf{y}_{n}\right\}_{n=1}^{N}$ by maximum likelihood estimation.

\mathcal{L}(\boldsymbol{\theta}) \approx-\frac{1}{N} \sum_{n=1}^{N} \log p_{\mathrm{z}}\left(T^{-1}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right)\right|+\mathrm{const.}

\mathcal{L}(\boldsymbol{\theta}) \approx-\frac{1}{N} \sum_{n=1}^{N} \log p_{\mathrm{z}}\left(T^{-1}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right)\right|+\mathrm{const.}

p_{\mathrm{y}}(\mathbf{y})

p_{\mathrm{y}}(\mathbf{y})

p_{\mathrm{z}}(\mathbf{z})

p_{\mathrm{z}}(\mathbf{z})

\mathbf{z}

\mathbf{z}

\mathbf{y}

\mathbf{y}

T

T

T^{-1}

T^{-1}

base density

target density

\begin{aligned} \mathcal{L}(\boldsymbol{\theta}) &=D_{\mathrm{KL}}\left[p_{\mathrm{y}}^{*}(\mathbf{y}) \| p_{\mathrm{y}}(\mathbf{y} ; \boldsymbol{\theta})\right] \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}(\mathbf{y} ; \boldsymbol{\theta})\right]+\text { const. } \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathrm{z}}\left(T^{-1}(\mathbf{y} ; \boldsymbol{\phi}) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}(\mathbf{y} ; \boldsymbol{\phi})\right|\right]+\mathrm{const} . \end{aligned}

\begin{aligned} \mathcal{L}(\boldsymbol{\theta}) &=D_{\mathrm{KL}}\left[p_{\mathrm{y}}^{*}(\mathbf{y}) \| p_{\mathrm{y}}(\mathbf{y} ; \boldsymbol{\theta})\right] \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}(\mathbf{y} ; \boldsymbol{\theta})\right]+\text { const. } \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathrm{z}}\left(T^{-1}(\mathbf{y} ; \boldsymbol{\phi}) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}(\mathbf{y} ; \boldsymbol{\phi})\right|\right]+\mathrm{const} . \end{aligned}

\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}^{*}(\mathbf{y} ; \boldsymbol{\theta})\right]

\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}^{*}(\mathbf{y} ; \boldsymbol{\theta})\right]

Rational Quadratic Neural Spline Flows
(RQ-NSF)

Train

\vec\theta = (m_1,m_2,d_L, ...) \in P_{prior}

\vec\theta = (m_1,m_2,d_L, ...) \in P_{prior}

\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}

\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}

nflow

\vec{z} \Rightarrow \mathbb{N}(0,\mathbb{I})

\vec{z} \Rightarrow \mathbb{N}(0,\mathbb{I})

Normalizing Flow Model (3/4)

归一化流模型示意图

Test

\vec\theta = (m_1,m_2,d_L, ...) \in P_{posterior}

\vec\theta = (m_1,m_2,d_L, ...) \in P_{posterior}

\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}

\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}

nflow

\vec{z} \in \mathbb{N}(0,\mathbb{I})

\vec{z} \in \mathbb{N}(0,\mathbb{I})

Train

\vec\theta = (m_1,m_2,d_L, ...) \in P_{prior}

\vec\theta = (m_1,m_2,d_L, ...) \in P_{prior}

\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}

\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}

nflow

\vec{z} \Rightarrow \mathbb{N}(0,\mathbb{I})

\vec{z} \Rightarrow \mathbb{N}(0,\mathbb{I})

Normalizing Flow Model (4/4)

Bayesian inference, the Holy Grail of gravitational-wave data analysis,
enables astrophysical interpretation and scientific discoveries.

Simulation-Based Inference (SBI)

SBI $\Rightarrow$ Fast and precise parameter estimation.
SBI $\Rightarrow$ TGR / Cosmology / PTA ...

Text

PRL 127, 24 (2021) 241103.

PRL 130, 17 (2023) 171403.

Real-time gravitational wave science with neural posterior estimation

Sampling with prior knowledge for high-dimensional gravitational wave data analysis

He Wang, et al. Big Data Min. Anal. (2021)

PRD 108, 4 (2023): 044029.

Neural Posterior Estimation with Guaranteed Exact Coverage: The Ringdown of GW150914

arXiv:2310.13405, LIGO-P2300306

Cosmological Inference using Gravitational Waves and Normalising Flows

Normalizing Flows as an Avenue to Studying Overlapping Gravitational Wave Signals

DOI: 10.1103/PhysRevLett.130.171402

Parameter Estimation · Scientific Discovery

arXiv:2310.12209

Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows

arXiv:2404.14286

Exact coverage first!

Paradigm

New
discovery

first!

PRD 108, 4 (2023): 044029.

Text

Appreciating the Ringdown Overtone Test of GW150914

A notable work involves ringdown overtone testing, which, acknowledging the difficulty in achieving DINGO-like precision for complex waveforms, leverages the speed advantage of AI.
By simulating the signal and $10^3$ realizations of LIGO noise for each pixel, it accomplishes what is impossible for MCMC methods, prioritizing speed over precision in a strategic trade-off.

Parameter Estimation · Scientific Discovery

Parameter estimation · Scientific discovery

Text

Exploring Stochastic Gravitational Wave Background with AI

Utilizing AI for parameter estimation in the stochastic gravitational wave background (SGWB) presents a fascinating blend of rich theoretical content and the potential for optimizing current data processing methods.
While still preliminary and ongoing, our work shows promising results for high SNR SGWB scenarios, where AI-based posterior probabilities are notably more precise and narrower compared to traditional cross-correlation methods used in PyGWB.

\Omega_{\mathrm{GW}}(f)=\Omega_{\mathrm{ref}}\left(\frac{f}{f_{\mathrm{ref}}}\right)^\alpha

\Omega_{\mathrm{GW}}(f)=\Omega_{\mathrm{ref}}\left(\frac{f}{f_{\mathrm{ref}}}\right)^\alpha

\Omega_{\mathrm{ref}}=10^{-6.1}

\Omega_{\mathrm{ref}}=10^{-6.1}

Our result (preliminary)

Parameter estimation · Scientific discovery

Text

Exploring Stochastic Gravitational Wave Background with AI

Performance saturation is observed between SNR levels of $10^{-6}$ to $10^{-7}$ , indicating a plateau in model effectiveness in low SNR conditions.
Unlike PyGWB, which can accumulate cross-correlation data from SGWB to further constrain the power spectrum, AI model outputs do not readily provide statistically meaningful information for aggregation. Multiplying posterior probabilities from multiple segments leads to ambiguous, and potentially biased, results due to the lack of statistically significant fluctuations across different posterior distributions.

Abbott R, et al. PRD 104, 2 (2021): 022004.

PyGWB result

Our result (preliminary)

\Omega_{\mathrm{GW}}(f)=\Omega_{\mathrm{ref}}\left(\frac{f}{f_{\mathrm{ref}}}\right)^\alpha

\Omega_{\mathrm{GW}}(f)=\Omega_{\mathrm{ref}}\left(\frac{f}{f_{\mathrm{ref}}}\right)^\alpha

Key Takeaways

Text

~~Statistics~~

\times N

\times N

\times N

\times N

~~Statistics~~

AI vs Classical Methods

In strict sense, at present AI methods have not yet become the perfect alternative to traditional methods (like MF, MCMC, etc.).
- There is a pressing need for the theoretical refinement of ML applications in GW statistics, aiming to bridge current gaps and enhance model reliability.
- How can we address the issue of strong or unacceptable biases that occur when outputs from AI models are used jointly or in combination to measure properties of a population, sub-population, or ensemble?
  (also addressed by 2405.18095)
How can we construct a gravitational wave statistical theory based on machine learning methods to achieve robust theoretical guarantees for detection statistics and statistical inference?
- $\rightarrow$ Is the goal to achieve results that are entirely consistent with the statistical properties of traditional methods, or
- $\rightarrow$ to quantify and calibrate any results provided by AI within the statistical theory framework?

Gravitational waves and sources：

Galactic Binary (GB) [ $\mathcal{O}(10^4) \text{ in } \mathcal{O}(10^7)$ ]
Massive Black Hole Binary (MBHB) [ $\mathcal{O}(2)\sim\mathcal{O}(10^2)$ ]
Extreme Mass-Ratio Inspiral (EMRI) [ $\mathcal{O}(10)\sim\mathcal{O}(10^3)$ ]
Stellar-mass Black Hole Binary (SBHB)
Stochastic Gravitational Wave Background (SGWB)
Unmodelled sources (eg: Burst...)

Credit: ESA, K. Holley-Bockelmann

(Sec.8.3.1 The Red Book)

The analysis of scientific data from space-based GW detection differs significantly from ground-based detection:

A superposition of overlapping signals ( $\neq$ isolated event)
Observations of more waveform periods over different time scales ( $\neq$ short-duration signals)
Signal-dominated detection ( $\neq$ noise-dominated)
Reliance on more complex techniques for noise assessment
( $\neq$ regular acquisition of signal-free data)

Space-borne GW Detection: Background

空间引力波探测科学数据处理的挑战与人工智能技术的应用

王赫, 杜明辉, 徐鹏, 周宇峰

2024年, 第54卷, 第7期, 270403

https://doi.org/10.1360/SSPMA-2024-0087

Rapid PE for Space-borne GW Detection

M. Du, B. Liang, HW, P. Xu, Z. Luo, Y. Wu. SCPMA 67, 230412 (2024).

Global vs. Individual Analysis: While global-fit techniques effectively manage the dense overlapping of signals in space-based GW data, individual pipelines are crucial for detecting unique events.
Role of Individual Pipelines: These pipelines act as a pre-processing step, focusing on particular types of sources and diving deeper into the data. They refine the analysis by working on the latest best-fit residuals from the global fit.
Case Study - MBHB Mergers: Mergers of MBHBs often exhibit high SNR between $10^2$ to $10^3$ , appearing as distinct peaks in data time series.

Data curation
- Model: frequency domain; PhenomD; TDI-A/E response
- Input: 1 day length; 15Hz; shape=(2, 3, 2877)
- Noise: Gaussian stationary from the noise PSD (for training/test) + GB confusion noise (for test)
- Project: Taiji program

M. Du, B. Liang, HW, P. Xu, Z. Luo, Y. Wu. SCPMA 67, 230412 (2024).

Customization for the Taiji scenario: A scalable approach

The top section of the illustration shows the solar system barycenter (SSB) and Taiji frames, with two black dashed arrows symbolizing not two separate GW signals, but rather indicating how the sky location and arrival time of the same GW signal take different values in these two frames.

The “positive” problem translates the SSB-frame parameters to their Taiji-frame counterparts via a time-dependent mapping $f_1$ , then to the TDI outputs through a time-independent mapping $f_2$ , and an exponential term.

TDI-A

These steps can be schematically summarized as:

where $\mathcal{T}_\alpha^{A, E}(f)$ is often referred to as the transfer function.

A, E(f)=\sum_\alpha \mathcal{T}_\alpha^{A, E}(f) \tilde{h}_\alpha(f), \quad \alpha \in\{+, \times\}

A, E(f)=\sum_\alpha \mathcal{T}_\alpha^{A, E}(f) \tilde{h}_\alpha(f), \quad \alpha \in\{+, \times\}

M. Du, B. Liang, HW, P. Xu, Z. Luo, Y. Wu. SCPMA 67, 230412 (2024).

Customization for the Taiji scenario: A scalable approach

Consequently, even if the network has only learned the time-dependent relationship between $\boldsymbol{\theta}_S$ and the TDI response at a specific tref (the 30th day in our case), with the aid of coordinate transformation, it has essentially learned the time-invariant mapping $f_2$ , and can be then generalized to make parameter estimation at any other reference time.
It is worth noting that our method relies on analytical orbits and
the time-independence of the coordinate transformation $f_2$ .

The top section of the illustration shows the solar system barycenter (SSB) and Taiji frames, with two black dashed arrows symbolizing not two separate GW signals, but rather indicating how the sky location and arrival time of the same GW signal take different values in these two frames.

The “positive” problem translates the SSB-frame parameters to their Taiji-frame counterparts via a time-dependent mapping $f_1$ , then to the TDI outputs through a time-independent mapping $f_2$ , and an exponential term.

1 year length

can infer at any other reference time

trained on the 30th day only

M. Du, B. Liang, HW*, P. Xu, Z. Luo, Y. Wu*. SCPMA 67, 230412 (2024).

Multimodality in extrinsic parameters

Overview of Findings: Nested sampling results indicate minimal expected multimodality in ecliptic coordinates. However, distinct peaks identified in the time of coalescence ( $t_c$ ), labeled as NF-1 (dominant) and NF-2 (subdominant), highlight unique multimodal behavior.
Impact on PE: The presence of these peaks affects the posterior distributions of extrinsic parameters, potentially leading to inaccuracies in $t_c$ and subsequent parameters due to phase term associations and inherent degeneracies.
Model Performance: Despite the multimodality, the best-fit values from the NF model closely align with true values within the $1\sigma$ range for most parameters, and at least $2\sigma$ for others.
Comparative Analysis: The ML pipeline tends to produce broader posteriors compared to the Bayesian nested sampling approach.

（NF = Normalizing Flow model）

M. Du, B. Liang, HW, P. Xu, Z. Luo, Y. Wu. SCPMA 67, 230412 (2024).

Ongoing & Future Plan

Earth-based GW detection ( $\sim10^2$ Hz)

A Python Toolbox for Gravitational Wave Astronomy: GWToolkit
- This toolbox is powered by Ray/JAX and supports both CPU and GPU. It is designed specifically for machine learning applications.
Gravitational-Wave Observatory Open Source Data Portal
(基于引力波探测开源数据的共享数据门户)
Can AI identify new GW events from LIGO data?
- Could this be a GW signal beyond General Relativity (GR)?
How can we address the issue of strong or unacceptable biases that occur when outputs from AI models are used jointly or in combination to measure properties of a population, sub-population, or ensemble?
(also addressed by 2405.18095)

Text

Space-based GW detection ( $\sim10^{-3}$ Hz)

“Global fit” challenge
- How can we achieve and accelerate the Bayesian inference through algorithmic innovations?
  - Flow-based proposal?
  - Transdimensional Nested Sampling?
- How can we leverage powerful LLM-based methods to accomplish this?

Text

中国科学院计算机网络信息中心“东方”超级计算系统 (全国产CPU/GPGPU)

Ongoing and Future Projects

Neural density estimation

Density fit for posterior distributions
- use the old posterior to form a proposal for the extended data.
Density fit for the Galaxy
- fitt a Galaxy model for joint distribution for $(A, \beta, \lambda)$ .
...

Text

Ref:

Ashton, G, and C Talbot. MNRAS 507, no. 2 (2021): 2037–51.
Korsakova, N, et al. (2402.13701)
Wouters, T, et al. (2404.11397)

Ongoing and Future Projects

Neural density estimation

Density fit for posterior distributions
- use the old posterior to form a proposal for the extended data.
Density fit for the Galaxy
- fitt a Galaxy model for joint distribution for $(A, \beta, \lambda)$ .
...

Text

nflow

\mathcal{N}(0,\mathbb{I})

\mathcal{N}(0,\mathbb{I})

	import mxnet as mx
	from mxnet import nd, gluon
	from loguru import logger

	def MFCNN(fs, T, C, ctx, template_block, margin, learning_rate=0.003):
	logger.success('Loading MFCNN network!')
	net = gluon.nn.Sequential()
	with net.name_scope():
	net.add(MatchedFilteringLayer(mod=fs*T, fs=fs,
	template_H1=template_block[:,:1],
	template_L1=template_block[:,-1:]))
	net.add(CutHybridLayer(margin = margin))
	net.add(Conv2D(channels=16, kernel_size=(1, 3), activation='relu'))
	net.add(MaxPool2D(pool_size=(1, 4), strides=2))
	net.add(Conv2D(channels=32, kernel_size=(1, 3), activation='relu'))
	net.add(MaxPool2D(pool_size=(1, 4), strides=2))
	net.add(Flatten())
	net.add(Dense(32))
	net.add(Activation('relu'))
	net.add(Dense(2))
	# Initialize parameters of all layers
	net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx, force_reinit=True)
	return net

Enhancing Gravitational Wave Astronomy with Artificial Intelligence

Enhancing Gravitational Wave Astronomy with Artificial Intelligence

Enhancing Gravitational Wave Astronomy with Artificial Intelligence

More from He Wang