2024年5月17日, 17:05-17:20 

2024 年重庆引力与天体物理学术研讨会 · 重庆邮电大学

Exploring the Frontiers of Parameter Estimation with AI in Gravitational Wave Research

王赫 (He Wang)


中国科学院大学 · 国际理论物理中心(亚太地区)

中国科学院大学 · 引力波宇宙太极实验室(北京/杭州)

In cooperation with

Z.Cao, Z.Ren, M.Du, B.Liang, P.Xu, Z.Luo, Y.Wu, et al.

  • non-GW
    • NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport (1903.03704 )
    • Accelerated Bayesian inference using deep learning (https://doi.org/10.1093/mnras/staa1469)
    • Nested Sampling Methods (2101.09675)
    • pocoMC: A Python package for accelerated Bayesian inference in astronomy and cosmology (2207.05660)
    • Parallelized Acquisition for Active Learning using Monte Carlo Sampling (2305.19267)
    • NAUTILUS: boosting Bayesian importance nested sampling with deep learning (2306.16923)
    • Improving Gradient-guided Nested Sampling for Posterior Inference (2312.03911)
    • floZ: Evidence estimation from posterior samples with normalizing flows (2404.12294)
    • Deep Learning and genetic algorithms for cosmological Bayesian inference speed-up (2405.03293)
  • GW
    • Nested Sampling with Normalising Flows for Gravitational-Wave Inference (2102.11056)
    • Bilby-MCMC: an MCMC sampler for gravitational-wave (2106.08730)
    • Nested sampling for physical scientists (2205.15570)
    • Fast gravitational wave parameter estimation without compromises (2302.05333)
    • Importance nested sampling with normalising flows (2302.08526)
    • Neural density estimation for Galactic Binaries in LISA data analysis (2402.13701)
    • Robust parameter estimation within minutes on gravitational wave signals from binary neutron star inspirals (2404.11397)

10+5 = 15min

space-based (2)

what is flow and flow-based (4)

how flow can be used in MCMC.

mini Global-fit + flow





Evidence for eccentricity in the population of binary black holes observed by LIGO-Virgo-KAGRA




  • In 1916, A. Einstein proposed the GR and predicted the existence of GW.

  • Gravitational waves (GW) are a strong field effect in the GR.

    • 2015: the first experimental detection of GW from the merger of two black holes was achieved.

    • 2017: the first multi-messenger detection of a BNS signal was achieved, marking the beginning of multi-messenger astronomy.

    • 2017: the Nobel Prize in Physics was awarded for the detection of GW.

    • As of now: more than 90 gravitational wave events have been discovered.

    • O4, which began on May 24th 2023, is currently in progress.

Gravitational Wave Astronomy

Gravitational waves generated by binary black holes system

GW detector

Gravitational Wave Astronomy

  • Fundamental Physics
    • Existence of gravitational waves
    • To put constraints on the properties of gravitons
  • Astrophysics
    • Refine our understanding of stellar evolution
    • and the behavior of matter under extreme conditions.
  • Cosmology
    • The measurement of the Hubble constant
    • Dark energy

The first GW event of GW150914

Parameter estimation · Scientific discovery

Credit: LIGO Magazine.

  • Traditional parameter estimation (PE) techniques rely on Bayesian analysis methods (posteriors + evidence)

  • Computing the full 15-dimensional posterior distribution estimate is very time-consuming:
    • Calculating likelihood function
    • Template generation time-consuming
  • Machine learning algorithms are expected to speed up

Challenges of Parameter Estimation for GW

Bayesian statistics

Data quality improvement

Credit: Marco Cavaglià 

LIGO-Virgo data processing

GW searches

Astrophsical interpretation of GW sources

AI for Gravitational Wave: Parameter Estimation

  • A complete 15-dimensional posterior probability distribution, taking about 1 s (<< 10410^4 s).
  • Prior Sampling: 50,000 Posterior samples in approximately 8 Seconds.
  • Capable of calculating evidence
  • Processing time: (using 64 CPU cores)
    • less than 1 hour with IMRPhenomXPHM,
    • approximately 10 hours with SEOBNRv4PHM

PRL 127, 24 (2021) 241103.

PRL 130, 17 (2023) 171403.

Nature Physics 18, 1 (2022) 112–17

HW, et al. Big Data Mining and Analytics 5, 1 (2021) 53–63.

A diagram of prior sampling between feature space and physical parameter space

Rapid PE for Space-borne GW Detection

  • Data curation

    • Model: frequency domain, PhenomD,  TDI-A,E response

    • Data:1 day, 15s per sample, shape=(2, 3, 2877)

    • Noise: Gaussian stationary from PSD + GB confusion noise

    • Project: Taiji program

M. Du, B. Liang, HW, P. Xu, Z. Luo, Y. Wu. SCPMA 67, 230412 (2024).

  • Motivation: To preprocess Global Fit data for early detection of merged electromagnetic observations for MBHBs.

(Based on 1912.02762

The ABC of Normalizing Flow 

The main idea of flow-based modeling is to express yRD\mathbf{y}\in\mathbb{R}^D as a transformation TT of a real vector zRD\mathbf{z}\in\mathbb{R}^D sampled from pz(z)p_{\mathrm{z}}(\mathbf{z}):

y=T(z) where zpy(z)\mathbf{y}=T(\mathbf{z}) \quad \text { where } \quad \mathbf{z} \sim p_{\mathrm{y}}(\mathbf{z})
\mathbf{y}=T(\mathbf{z}) \quad \text { where } \quad \mathbf{z} \sim p_{\mathrm{y}}(\mathbf{z})

Note: The invertible and differentiable transformation TT and the base distribution pz(z)p_{\mathrm{z}}(\mathbf{z}) can have parameters {ϕ,ψ}\{\boldsymbol{\phi}, \boldsymbol{\psi}\} of their own, i.e. Tϕ T_{\phi} and pz,ψ(z)p_{\mathrm{z},\boldsymbol{\psi}}(\mathbf{z}).

Change of Variables:

py(y)=pz(z)detJT(z)1 where u=T1(x).p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}(\mathbf{z})\left|\operatorname{det} J_{T}(\mathbf{z})\right|^{-1} \quad \text { where } \quad \mathbf{u}=T^{-1}(\mathbf{x}) .
p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}(\mathbf{z})\left|\operatorname{det} J_{T}(\mathbf{z})\right|^{-1} \quad \text { where } \quad \mathbf{u}=T^{-1}(\mathbf{x}) .
JT(z)=[T1z1T1zDTDz1TDzD]J_{T}(\mathbf{z})=\left[\begin{array}{ccc} \frac{\partial T_{1}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{1}}{\partial \mathrm{z}_{D}} \\ \vdots & \ddots & \vdots \\ \frac{\partial T_{D}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{D}}{\partial \mathrm{z}_{D}} \end{array}\right]
J_{T}(\mathbf{z})=\left[\begin{array}{ccc} \frac{\partial T_{1}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{1}}{\partial \mathrm{z}_{D}} \\ \vdots & \ddots & \vdots \\ \frac{\partial T_{D}}{\partial \mathrm{z}_{1}} & \cdots & \frac{\partial T_{D}}{\partial \mathrm{z}_{D}} \end{array}\right]


The Jacobia JT(u)J_{T}(\mathbf{u}) is the D×DD \times D matrix of all partial derivatives of TT given by:

py(y)=pz(T1(y))detJT1(y)p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}\left(T^{-1}(\mathbf{y})\right)\left|\operatorname{det} J_{T^{-1}}(\mathbf{y})\right|
p_{\mathrm{y}}(\mathbf{y})=p_{\mathrm{z}}\left(T^{-1}(\mathbf{y})\right)\left|\operatorname{det} J_{T^{-1}}(\mathbf{y})\right|

base density

target density

Rational Quadratic Neural Spline Flows (RQ-NSF)

(Based on 1912.02762

  • Data: target data yR11\mathbf{y}\in\mathbb{R}^{11} (with condition data x\mathbf{x}).
  • Task:
    • Fitting a flow-based model py(y;θ)p_{\mathrm{y}}(\mathbf{y} ; \boldsymbol{\theta}) to a target distribution py(y)p_{\mathrm{y}}^{*}(\mathbf{y})
    • by minimizing KL divergence with respect to the model’s parameters θ={ϕ,ψ}\boldsymbol{\theta}=\{\boldsymbol{\phi}, \boldsymbol{\psi}\},
    • where ϕ\boldsymbol{\phi} are the parameters of TT and ψ\boldsymbol{\psi} are the parameters of pz(z)=N(0,I)p_{\mathrm{z}}(\mathbf{z})=\mathcal{N}(0,\mathbb{I}).
  • Loss function:

  • Assuming we have a set of samples {yn}n=1Npy(y)\left\{\mathbf{y}_{n}\right\}_{n=1}^{N}\sim p_{\mathrm{y}}^{*}(\mathbf{y}),

    Minimizing the above Monte Carlo approximation of the KL divergence is equivalent to fitting the flow-based model to the samples {yn}n=1N\left\{\mathbf{y}_{n}\right\}_{n=1}^{N} by maximum likelihood estimation.
L(θ)1Nn=1Nlogpz(T1(yn;ϕ);ψ)+logdetJT1(yn;ϕ)+const.\mathcal{L}(\boldsymbol{\theta}) \approx-\frac{1}{N} \sum_{n=1}^{N} \log p_{\mathrm{z}}\left(T^{-1}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right)\right|+\mathrm{const.}
\mathcal{L}(\boldsymbol{\theta}) \approx-\frac{1}{N} \sum_{n=1}^{N} \log p_{\mathrm{z}}\left(T^{-1}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}\left(\mathbf{y}_{n} ; \boldsymbol{\phi}\right)\right|+\mathrm{const.}
L(θ)=DKL[py(y)py(y;θ)]=Epy(y)[logpy(y;θ)]+ const. =Epy(y)[logpz(T1(y;ϕ);ψ)+logdetJT1(y;ϕ)]+const.\begin{aligned} \mathcal{L}(\boldsymbol{\theta}) &=D_{\mathrm{KL}}\left[p_{\mathrm{y}}^{*}(\mathbf{y}) \| p_{\mathrm{y}}(\mathbf{y} ; \boldsymbol{\theta})\right] \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}(\mathbf{y} ; \boldsymbol{\theta})\right]+\text { const. } \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathrm{z}}\left(T^{-1}(\mathbf{y} ; \boldsymbol{\phi}) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}(\mathbf{y} ; \boldsymbol{\phi})\right|\right]+\mathrm{const} . \end{aligned}
\begin{aligned} \mathcal{L}(\boldsymbol{\theta}) &=D_{\mathrm{KL}}\left[p_{\mathrm{y}}^{*}(\mathbf{y}) \| p_{\mathrm{y}}(\mathbf{y} ; \boldsymbol{\theta})\right] \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}(\mathbf{y} ; \boldsymbol{\theta})\right]+\text { const. } \\ &=-\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathrm{z}}\left(T^{-1}(\mathbf{y} ; \boldsymbol{\phi}) ; \boldsymbol{\psi}\right)+\log \left|\operatorname{det} J_{T^{-1}}(\mathbf{y} ; \boldsymbol{\phi})\right|\right]+\mathrm{const} . \end{aligned}
Epy(y)[logpy(y;θ)]\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}^{*}(\mathbf{y} ; \boldsymbol{\theta})\right]
\mathbb{E}_{p_{\mathbf{y}}^{*}(\mathbf{y})}\left[\log p_{\mathbf{y}}^{*}(\mathbf{y} ; \boldsymbol{\theta})\right]
θ=(m1,m2,dL,...)Pprior\vec\theta = (m_1,m_2,d_L, ...) \in P_{prior}
\vec\theta = (m_1,m_2,d_L, ...) \in P_{prior}
x=hθ+n\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}
\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}


zN(0,I)\vec{z} \Rightarrow \mathcal{N}(0,\mathbb{I})
\vec{z} \Rightarrow \mathcal{N}(0,\mathbb{I})
θ=(m1,m2,dL,...)Pposterior\vec\theta = (m_1,m_2,d_L, ...) \in P_{posterior}
\vec\theta = (m_1,m_2,d_L, ...) \in P_{posterior}
x=hθ+n\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}
\vec{x}=\vec{h}_{\vec{\theta}} + \vec{n}


zN(0,I)\vec{z} \in \mathcal{N}(0,\mathbb{I})
\vec{z} \in \mathcal{N}(0,\mathbb{I})



The ABC of Normalizing Flow 

Rapid PE for Space-borne GW Detection

  • Results:

M. Du, B. Liang, HW, P. Xu, Z. Luo, Y. Wu. SCPMA 67, 230412 (2024).

  • Computational performance

    • 10000 samples in 2.7 sec

  • Multimodality in extrinsic parameters

  • Unbiased estimation and confidence validation

Ongoing and Future Projects

Pipeline Targets Programing Language (sampling method) Comments
(Littenberg&Cornish 2023)
C / Python (TPMCMC / RJMCMC) noise_mcmc+gb_mcmc+vb_mcmc+global_fit
Eryn UCB Python (TPMCMC / RJMCMC) Mini code for UCB case
PyCBC-INFERENCE MBHB Python (?) Unavailable
Bilby in Space / tBilby MBHB / ? ? / Python? (RJMCMC) Unavailable
Strub et al. UCB ? (GP) Unavailable / GPU-based
Zhang et al. (LZU) UCB ? (PSO) MLP
Balrog MBHB ?

(Sec.8.6 Red Book)

Global Fit

  • The idea of the global fit method is to comprehensively model all astrophysical and instrumental features present in the space-borne gravitational wave data.
  • This approach not only focuses on the signal from a single source, but attempts to capture the combined effects of all sources in the data, conducting a comprehensive analysis of the entire dataset to identify and model all potential signal and noise sources.

Technical challenges:

  • High dimensional
  • Highly correlated
  • Multimodality
  • Trans-dimensional


Ongoing and Future Projects

Neural density estimation

  • Density fit for posterior distributions
    • use the old posterior to form a proposal for the extended data.
  • Density fit for the Galaxy
    • fitt a Galaxy model for joint distribution for (A,β,λ)(A, \beta, \lambda).
  • ...



  • Ashton, G, and C Talbot. MNRAS 507, no. 2 (2021): 2037–51.
  • Korsakova, N, et al. (2402.13701)
  • Wouters, T, et al. (2404.11397​)

Ongoing and Future Projects

Neural density estimation

  • Density fit for posterior distributions
    • use the old posterior to form a proposal for the extended data.
  • Density fit for the Galaxy
    • fitt a Galaxy model for joint distribution for (A,β,λ)(A, \beta, \lambda).
  • ...





  • Ashton, G, and C Talbot. MNRAS 507, no. 2 (2021): 2037–51.
  • Korsakova, N, et al. (2402.13701)
  • Wouters, T, et al. (2404.11397​)

Ongoing and Future Projects

Neural density estimation

  • Density fit for posterior distributions
    • use the old posterior to form a proposal for the extended data.
  • Density fit for the Galaxy
    • fitt a Galaxy model for joint distribution for (A,β,λ)(A, \beta, \lambda).
  • ...



for _ in range(num_of_audiences):
    print('Thank you for your attention! 🙏')

Exploring the Frontiers of Parameter Estimation with AI in Gravitational Wave Research

By He Wang

Exploring the Frontiers of Parameter Estimation with AI in Gravitational Wave Research

2024 年重庆引力与天体物理学术研讨会

  • 229