Gravitational-Wave Data Analysis

 via Deep Learning

He Wang (王赫)  


Department of Physics, Beijing Normal University

On behalf of the KAGRA collaboration


In collaboration with Prof. Zhou-Jian Cao

Jan 10th, 2020

Key Laboratory of Computational Geodynamics, CAS


  • Introduction
    • Background
    • Basic of data analysis
  • GW detection in stimulated gauss noise
    • Our past attempts
  • MF-ConvNet Model
    • Motivation
    • Matched-filtering in time domain
    • Matched-filtering ConvNet (MF-CNN)
  • Experiments & Results
    • Dataset & Training details
    • Recovering GW Events
    • Population property on O1
  • Summary


Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015





Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015



Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015



Laser interferometer detectors



Laser interferometer detectors

Observation run O1:September 12, 2015 - January 19, 2016


B. P. Abbott et al., Prospects for Observing and Localizing Gravitational-Wave Transients with Advanced LIGO, Advanced Virgo and KAGRA, 2016, Living Rev. Relativity 19

Abbott et al, PRX 6, 041015 (2016)



Multi-messenger astrophysics(多信使天文学)

曹周键. 从引力波探测到包含引力波的多信使天文学[J]. 大学物理, 2018, 37(2).

Event GW170817(首例双中子星合并事件)

LIGO and Virgo make first detection of gravitational waves produced by colliding neutron stars Discovery marks first cosmic event observed in both gravitational waves and light.



Multi-messenger astrophysics(多信使天文学)

曹周键. 从引力波探测到包含引力波的多信使天文学[J]. 大学物理, 2018, 37(2).


  1. Direct link between NS-NS merger and GRB
  2. Direct link between NS-NS merger and kilonova
  3. Determine the origin of heavy element as NS-NS merger
  4. Consistent to theoretical model


2010 Class. Quantum Grav. 27 084005

Data analysis

  • 曹周键. 引力波探测和引力波天文学[J]. 现代物理知识, 2015, v.27;No.161(05):42-46.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis


Data analysis

  • 曹周键. 引力波探测和引力波天文学[J]. 现代物理知识, 2015, v.27;No.161(05):42-46.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

  • Nothing can be seen from the raw data
  • Signal locates around t = 0: no special feature there

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

  • Enlarge the time region where signal locates
  • Apparently it is dominated by some characteristic behavior of the detector

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

  • Rule out data which surely has nothing to do with GW signal (band-pass filter, band-stop filter)

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

  • Calculate the matched-filtering SNR for a target template

Basic of Data Analysis


Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

  • Calculate the matched-filtering SNR for a target template

Basic of Data Analysis

GW Detection in Stimulated  Background Noise

  • 曹周键, 王赫, 朱建阳. 深度学习在引力波数据处理中的应用初探[J]. 河南师范大学学报(自然科学版), 2018, v.46;No.199(02):2+32-45.
  • He Wang, etc. Representation Learning of Noisy Gravitational Waves by Convolutional Networks. In preparations (2019)
  • Problems
    • Current matched filtering techniques are computationally expensive.
    • Non-Gaussian noise limits the optimality of searches.
    • Difficult to find the GW signals beyond the theoretical expectation. Un-modelled signals?


  • Existing CNN-based approaches:
    • Daniel George & E. A. Huerta (2018)
    • Hunter Gabbard et al. (2018)
    • X. Li et al. (2018)
    • Timothy D. Gebhard et al. (2019)
  • Problems
    • Current matched filtering techniques are computationally expensive.
    • Non-Gaussian noise limits the optimality of searches.
    • Difficult to find the GW signals beyond the theoretical expectation. Un-modelled signals?


  • Existing CNN-based approaches:
    • Daniel George & E. A. Huerta (2018)
    • Hunter Gabbard et al. (2018)
    • X. Li et al. (2018)
    • Timothy D. Gebhard et al. (2019)


Machine Learning / Deep Learning


Map / Algorithm



A number

A sequence

Yes or No

\{a_1, a_2, \dots, a_n\}
\{0 \text{ or } 1\}
\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Our model / network

Past attempts

  • Initial model / network


Convolutional neural network (ConvNet or CNN)

Feature extraction

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

Visualization for the high-dimensional feature maps of learned network in layers for bi-class using t-SNE.

  • A glimpse of model Interpretability and Visualization
  • The Influence of hyperparameters?

Effect of the number of the convolutional layers on signal recognizing accuracy.


\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

  • A glimpse of model Interpretability using Visualizing

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

  • A glimpse of model Interpretability using Visualizing

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Occlusion Sensitivity

Peak of GW!

Past attempts

  • A glimpse of model Interpretability using Visualizing

Occlusion Sensitivity

Peak of GW!

Past attempts

  • However, when on real noises from LIGO, this approach does not work that well. (too sensitive + hard to find the events)

A specific design of the architecture is needed.

 [as Timothy D. Gebhard et al. (2019)]

  • He Wang, Zhoujian Cao, et al. "Gravitational wave signal recognition of O1 data by deep learning”. e-Print: arXiv:1909.13442 [gr-qc]
  • Motivation

  • Matched-filtering in time domain

  • Matched-filtering ConvNet

MF-ConvNet Model

MF-ConvNet Model

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.


MF-ConvNet Model


Is it matched-filtering?

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

MF-ConvNet Model

Frequency domain

Matched-filtering in time domain

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)



Time domain

Frequency domain



\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df
\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2
\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}
\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)
\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df
\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

(A schematic illustration for a unit of convolution layer)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain





\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}
\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)
\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Time domain

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

FYI:       \(N_\ast = \lfloor(N-K+2P)/S\rfloor+1\)

In the 1-D convolution (\(*\)), given input data with shape [batch size, channel, length] :

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

Wrapping (like the pooling layer)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain





\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}
\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)
\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Time domain

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

MF-ConvNet Model


\rho_m[1,C,1] = \max{\rho[1,C,N]}
\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}


MF-ConvNet Model


In the meanwhile, we can obtain the optimal time \(N_0\) (relative to the input) of feature response of matching by recording the location of the maxima value corresponding to the optimal template \(C_0\)


C_0 = \mathop{\arg\max}_{C}\rho[1,C,N] \,,\\ N_0 = \mathop{\arg\max}_{N} U[1,C_0,N]
\rho_m[1,C,1] = \max{\rho[1,C,N]}
\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}
  • Experiments & Results

  • Dataset & Templates
  • Training Strategy
  • Search methodology
  • Recovering GW Events
  • Population property on O1
  • He Wang, Zhoujian Cao, et al. "Gravitational wave signal recognition of O1 data by deep learning”. e-Print: arXiv:1909.13442 [gr-qc]

62.50M⊙ + 57.50M⊙ (\(\rho_{amp}=0.5\))

  • The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.

(In preprint)

FYI: sampling rate = 4096Hz

  • We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.
template waveform (train/test)
Number 35 1610
Length (s) 1 5
equal mass

Dataset & Templates

Experiments & Results

FYI: sampling rate = 4096Hz

  • The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.
  • We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.
template waveform (train/test)
Number 35 1610
Length (s) 1 5
equal mass

(In preprint)

Dataset & Templates

Experiments & Results

(In preprint)

Experiments & Results


(sigmoid function)

  • GPUs: 4 NVIDIA GeForce GTX 1080Ti
  • MXNet: A Scalable Deep Learning Framework
  • Tukey window for both data and templates before input the network.
  • Xavier initialization [X Glorot & Y Bengio (2010)]
  • Binary softmax cross-entropy loss
  • Optimizer: Adam [Diederik P. Kingma & Jimmy Ba (2014)]
  • Learning rate: 0.003
  • Batch size: 16 x 4
  • Curriculum learning: decreasing the signal data with SNR \(\rho_{amp}\) distributed at 1, 0.1, 0.03 and 0.02.
\rho_{amp} = \frac{\max_t h}{\sqrt{\sigma_{noise}}}

Training Strategy

  • Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
  • The model can scan the whole range of the input segment and output a probability score.
  • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

(In preprint)

Experiments & Results

Search methodology

  • Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
  • The model can scan the whole range of the input segment and output a probability score.
  • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.


(In preprint)

Experiments & Results

Search methodology

  • Recovering all GW events in both O1 and O2

(In preprint)

Experiments & Results

  • Recovering all GW events in both O1 and O2

(In preprint)

Experiments & Results

Number of Adjacent prediction

  • Statistical significance on O1
    • Count a group of adjacent predictions as one "trigger block".
    • For pure background (non-Gaussian), monotone trend should be observed.
    • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

Population property on O1

Experiments & Results

True Positive Rate

False Alarm Rate

  • Sensitivity estimation (ROC)
    • Background: using time-shifting on the closed set from real LIGO recordings in O1
    • Injection: random simulated waveforms

(In preprint)

a bump at 5 adjacent predictions

Number of Adjacent prediction

  • Statistical significance on O1
    • Count a group of adjacent predictions as one "trigger block".
    • For pure background (non-Gaussian), monotone trend should be observed.
    • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.
  • Sensitivity estimation (ROC)
    • Background: using time-shifting on the closed set from real LIGO recordings in O1
    • Injection: random simulated waveforms

Population property on O1

Experiments & Results

False Alarm Rate

True Positive Rate

(In preprint)

  • Some benefits from MF-CNN architechure

    • Simple configuration for GW data generation

    • Almost no data pre-processing

    • Works on non-stationary background
    • Easy parallel deployments, multiple detectors can be benefit a lot from this design

    • More templates / smaller steps for searching can improve further


  • Some benefits from MF-CNN architechure

    • Simple configuration for GW data generation

    • Almost no data pre-processing

    • Works on non-stationary background
    • Easy parallel deployments, multiple detectors can be benefit a lot from this design

    • More templates / smaller steps for searching can improve further
  • Main understanding of the algorithms:
    • GW templates are used as likely features for matching
    • Generalization of both matched-filtering and neural networks
    • Matched-filtering can be rewritten as convolutional neural layers


Thank you for your attention!


By He Wang