Gravitational-Wave Data Analysis

 via Deep Learning

He Wang (王赫)  

[hewang@mail.bnu.edu.cn]

Department of Physics, Beijing Normal University

On behalf of the KAGRA collaboration

 

In collaboration with Prof. Zhou-Jian Cao

Jan 10th, 2020

Key Laboratory of Computational Geodynamics, CAS

Outline

  • Introduction
    • Background
    • Basic of data analysis
  • GW detection in stimulated gauss noise
    • Our past attempts
  • MF-ConvNet Model
    • Motivation
    • Matched-filtering in time domain
    • Matched-filtering ConvNet (MF-CNN)
  • Experiments & Results
    • Dataset & Training details
    • Recovering GW Events
    • Population property on O1
  • Summary

Introduction

Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015

Background

Introduction

Background

Introduction

Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015

Background

Introduction

Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015

Background

Introduction

Laser interferometer detectors

Background

Introduction

Laser interferometer detectors

Observation run O1:September 12, 2015 - January 19, 2016

 

B. P. Abbott et al., Prospects for Observing and Localizing Gravitational-Wave Transients with Advanced LIGO, Advanced Virgo and KAGRA, 2016, Living Rev. Relativity 19

Abbott et al, PRX 6, 041015 (2016)

Background

Introduction

Multi-messenger astrophysics(多信使天文学)

曹周键. 从引力波探测到包含引力波的多信使天文学[J]. 大学物理, 2018, 37(2).

Event GW170817(首例双中子星合并事件)

LIGO and Virgo make first detection of gravitational waves produced by colliding neutron stars Discovery marks first cosmic event observed in both gravitational waves and light.

Background

Introduction

Multi-messenger astrophysics(多信使天文学)

曹周键. 从引力波探测到包含引力波的多信使天文学[J]. 大学物理, 2018, 37(2).

Objective

  1. Direct link between NS-NS merger and GRB
  2. Direct link between NS-NS merger and kilonova
  3. Determine the origin of heavy element as NS-NS merger
  4. Consistent to theoretical model

Introduction

2010 Class. Quantum Grav. 27 084005

Data analysis

  • 曹周键. 引力波探测和引力波天文学[J]. 现代物理知识, 2015, v.27;No.161(05):42-46.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Basic of Data Analysis

Introduction

Data analysis

  • 曹周键. 引力波探测和引力波天文学[J]. 现代物理知识, 2015, v.27;No.161(05):42-46.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Whitening and filtering

  • Nothing can be seen from the raw data
  • Signal locates around t = 0: no special feature there

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Whitening and filtering

  • Enlarge the time region where signal locates
  • Apparently it is dominated by some characteristic behavior of the detector

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Whitening and filtering

  • Rule out data which surely has nothing to do with GW signal (band-pass filter, band-stop filter)

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Whitening and filtering

  • Calculate the matched-filtering SNR for a target template

https://www.gw-openscience.org

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

  • 曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学 力学 天文学(1):72.
  • LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).
 
 

Whitening and filtering

  • Calculate the matched-filtering SNR for a target template

https://www.gw-openscience.org

Basic of Data Analysis

GW Detection in Stimulated  Background Noise

  • 曹周键, 王赫, 朱建阳. 深度学习在引力波数据处理中的应用初探[J]. 河南师范大学学报(自然科学版), 2018, v.46;No.199(02):2+32-45.
  • He Wang, etc. Representation Learning of Noisy Gravitational Waves by Convolutional Networks. In preparations (2019)
  • Problems
    • Current matched filtering techniques are computationally expensive.
    • Non-Gaussian noise limits the optimality of searches.
    • Difficult to find the GW signals beyond the theoretical expectation. Un-modelled signals?

Background

  • Existing CNN-based approaches:
    • Daniel George & E. A. Huerta (2018)
    • Hunter Gabbard et al. (2018)
    • X. Li et al. (2018)
    • Timothy D. Gebhard et al. (2019)
  • Problems
    • Current matched filtering techniques are computationally expensive.
    • Non-Gaussian noise limits the optimality of searches.
    • Difficult to find the GW signals beyond the theoretical expectation. Un-modelled signals?

Background

  • Existing CNN-based approaches:
    • Daniel George & E. A. Huerta (2018)
    • Hunter Gabbard et al. (2018)
    • X. Li et al. (2018)
    • Timothy D. Gebhard et al. (2019)

Solution:

Machine Learning / Deep Learning

ABC of ML

Map / Algorithm

Input

Output

A number

A sequence

Yes or No

\{a_1, a_2, \dots, a_n\}
\{0 \text{ or } 1\}
\mathbf{x}
\mathbf{y}
\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Our model / network

Past attempts

  • Initial model / network

Classification

Convolutional neural network (ConvNet or CNN)

Feature extraction

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

Visualization for the high-dimensional feature maps of learned network in layers for bi-class using t-SNE.

  • A glimpse of model Interpretability and Visualization
  • The Influence of hyperparameters?

Effect of the number of the convolutional layers on signal recognizing accuracy.

Marginal!

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

  • A glimpse of model Interpretability using Visualizing

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

  • A glimpse of model Interpretability using Visualizing

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Occlusion Sensitivity

Peak of GW!

Past attempts

  • A glimpse of model Interpretability using Visualizing

Occlusion Sensitivity

Peak of GW!

Past attempts

  • However, when on real noises from LIGO, this approach does not work that well. (too sensitive + hard to find the events)

A specific design of the architecture is needed.

 [as Timothy D. Gebhard et al. (2019)]

  • He Wang, Zhoujian Cao, et al. "Gravitational wave signal recognition of O1 data by deep learning”. e-Print: arXiv:1909.13442 [gr-qc]
  • Motivation

  • Matched-filtering in time domain

  • Matched-filtering ConvNet

MF-ConvNet Model

MF-ConvNet Model

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

Motivation

MF-ConvNet Model

Motivation

Is it matched-filtering?

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

MF-ConvNet Model

Frequency domain

Matched-filtering in time domain

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

(whitening)

where

Time domain

Frequency domain

(normalizing)

(matched-filtering)

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df
\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2
\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}
\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)
\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df
\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

(A schematic illustration for a unit of convolution layer)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

(whitening)

where

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}
\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)
\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Time domain

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

FYI:       \(N_\ast = \lfloor(N-K+2P)/S\rfloor+1\)

In the 1-D convolution (\(*\)), given input data with shape [batch size, channel, length] :

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

Wrapping (like the pooling layer)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

(whitening)

where

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}
\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)
\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Time domain

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df
\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

MF-ConvNet Model

\(\bar{S_n}(t)\)

\rho_m[1,C,1] = \max{\rho[1,C,N]}
\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}

Architechture

MF-ConvNet Model

Architechture

In the meanwhile, we can obtain the optimal time \(N_0\) (relative to the input) of feature response of matching by recording the location of the maxima value corresponding to the optimal template \(C_0\)

\(\bar{S_n}(t)\)

C_0 = \mathop{\arg\max}_{C}\rho[1,C,N] \,,\\ N_0 = \mathop{\arg\max}_{N} U[1,C_0,N]
\rho_m[1,C,1] = \max{\rho[1,C,N]}
\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}
  • Experiments & Results

  • Dataset & Templates
  • Training Strategy
  • Search methodology
  • Recovering GW Events
  • Population property on O1
  • He Wang, Zhoujian Cao, et al. "Gravitational wave signal recognition of O1 data by deep learning”. e-Print: arXiv:1909.13442 [gr-qc]

62.50M⊙ + 57.50M⊙ (\(\rho_{amp}=0.5\))

  • The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.

(In preprint)

FYI: sampling rate = 4096Hz

  • We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.
template waveform (train/test)
Number 35 1610
Length (s) 1 5
equal mass

Dataset & Templates

Experiments & Results

FYI: sampling rate = 4096Hz

  • The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.
  • We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.
template waveform (train/test)
Number 35 1610
Length (s) 1 5
equal mass

(In preprint)

Dataset & Templates

Experiments & Results

(In preprint)

Experiments & Results

Probability

(sigmoid function)

  • GPUs: 4 NVIDIA GeForce GTX 1080Ti
  • MXNet: A Scalable Deep Learning Framework
  • Tukey window for both data and templates before input the network.
  • Xavier initialization [X Glorot & Y Bengio (2010)]
  • Binary softmax cross-entropy loss
  • Optimizer: Adam [Diederik P. Kingma & Jimmy Ba (2014)]
  • Learning rate: 0.003
  • Batch size: 16 x 4
  • Curriculum learning: decreasing the signal data with SNR \(\rho_{amp}\) distributed at 1, 0.1, 0.03 and 0.02.
\rho_{amp} = \frac{\max_t h}{\sqrt{\sigma_{noise}}}

Training Strategy

  • Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
  • The model can scan the whole range of the input segment and output a probability score.
  • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

(In preprint)

Experiments & Results

Search methodology

  • Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
  • The model can scan the whole range of the input segment and output a probability score.
  • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

Input

(In preprint)

Experiments & Results

Search methodology

  • Recovering all GW events in both O1 and O2

(In preprint)

Experiments & Results

  • Recovering all GW events in both O1 and O2

(In preprint)

Experiments & Results

Number of Adjacent prediction

  • Statistical significance on O1
    • Count a group of adjacent predictions as one "trigger block".
    • For pure background (non-Gaussian), monotone trend should be observed.
    • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

Population property on O1

Experiments & Results

True Positive Rate

False Alarm Rate

  • Sensitivity estimation (ROC)
    • Background: using time-shifting on the closed set from real LIGO recordings in O1
    • Injection: random simulated waveforms

(In preprint)

a bump at 5 adjacent predictions

Number of Adjacent prediction

  • Statistical significance on O1
    • Count a group of adjacent predictions as one "trigger block".
    • For pure background (non-Gaussian), monotone trend should be observed.
    • In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.
  • Sensitivity estimation (ROC)
    • Background: using time-shifting on the closed set from real LIGO recordings in O1
    • Injection: random simulated waveforms

Population property on O1

Experiments & Results

False Alarm Rate

True Positive Rate

(In preprint)

  • Some benefits from MF-CNN architechure

    • Simple configuration for GW data generation

    • Almost no data pre-processing

    • Works on non-stationary background
    • Easy parallel deployments, multiple detectors can be benefit a lot from this design

    • More templates / smaller steps for searching can improve further

Summary

  • Some benefits from MF-CNN architechure

    • Simple configuration for GW data generation

    • Almost no data pre-processing

    • Works on non-stationary background
    • Easy parallel deployments, multiple detectors can be benefit a lot from this design

    • More templates / smaller steps for searching can improve further
  • Main understanding of the algorithms:
    • GW templates are used as likely features for matching
    • Generalization of both matched-filtering and neural networks
    • Matched-filtering can be rewritten as convolutional neural layers

Summary

Thank you for your attention!

Slide_UCAS

By He Wang