Gravitational-Wave Data Analysis

via Deep Learning

He Wang (王赫)

[hewang@mail.bnu.edu.cn]

Department of Physics, Beijing Normal University

On behalf of the KAGRA collaboration

In collaboration with Prof. Zhou-Jian Cao

Jan 10th, 2020

Key Laboratory of Computational Geodynamics, CAS

Outline

Introduction
- Background
- Basic of data analysis
GW detection in stimulated gauss noise
- Our past attempts
MF-ConvNet Model
- Motivation
- Matched-filtering in time domain
- Matched-filtering ConvNet (MF-CNN)
Experiments & Results
- Dataset & Training details
- Recovering GW Events
- Population property on O1
Summary

Introduction

Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015

Background

Introduction

Background

Introduction

Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015

Background

Introduction

Event GW150914

Chirp-signal from gravitational waves from two coalescing black holes were observed with the LIGO detectors by the LIGO-Virgo Consortium on September 14, 2015

Background

Introduction

Laser interferometer detectors

Background

Introduction

Laser interferometer detectors

Observation run O1：September 12, 2015 - January 19, 2016

B. P. Abbott et al., Prospects for Observing and Localizing Gravitational-Wave Transients with Advanced LIGO, Advanced Virgo and KAGRA, 2016, Living Rev. Relativity 19

Abbott et al, PRX 6, 041015 (2016)

Background

Introduction

Multi-messenger astrophysics（多信使天文学）

曹周键. 从引力波探测到包含引力波的多信使天文学[J]. 大学物理, 2018, 37(2).

Event GW170817（首例双中子星合并事件）

LIGO and Virgo make first detection of gravitational waves produced by colliding neutron stars Discovery marks first cosmic event observed in both gravitational waves and light.

Background

Introduction

Multi-messenger astrophysics（多信使天文学）

曹周键. 从引力波探测到包含引力波的多信使天文学[J]. 大学物理, 2018, 37(2).

Objective

Direct link between NS-NS merger and GRB
Direct link between NS-NS merger and kilonova
Determine the origin of heavy element as NS-NS merger
Consistent to theoretical model

Introduction

2010 Class. Quantum Grav. 27 084005

Data analysis

曹周键. 引力波探测和引力波天文学[J]. 现代物理知识, 2015, v.27;No.161(05):42-46.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis

Introduction

Data analysis

曹周键. 引力波探测和引力波天文学[J]. 现代物理知识, 2015, v.27;No.161(05):42-46.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

Nothing can be seen from the raw data
Signal locates around t = 0: no special feature there

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

Enlarge the time region where signal locates
Apparently it is dominated by some characteristic behavior of the detector

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

Rule out data which surely has nothing to do with GW signal (band-pass filter, band-stop filter)

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

Calculate the matched-filtering SNR for a target template

https://www.gw-openscience.org

Basic of Data Analysis

Introduction

Data analysis and Matched-filtering techniques

曹周键, 都志辉. 数值相对论与引力波天文学[J]. 中国科学:物理学力学天文学(1):72.
LIGO Scientific Collaboration, and Virgo Collaboration. "A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals." arXiv:1908.11170 (2019).

Whitening and filtering

Calculate the matched-filtering SNR for a target template

https://www.gw-openscience.org

Basic of Data Analysis

GW Detection in Stimulated Background Noise

曹周键, 王赫, 朱建阳. 深度学习在引力波数据处理中的应用初探[J]. 河南师范大学学报(自然科学版), 2018, v.46;No.199(02):2+32-45.
He Wang, etc. Representation Learning of Noisy Gravitational Waves by Convolutional Networks. In preparations (2019)

Problems
- Current matched filtering techniques are computationally expensive.
- Non-Gaussian noise limits the optimality of searches.
- Difficult to find the GW signals beyond the theoretical expectation. Un-modelled signals?

Background

Existing CNN-based approaches:
- Daniel George & E. A. Huerta (2018)
- Hunter Gabbard et al. (2018)
- X. Li et al. (2018)
- Timothy D. Gebhard et al. (2019)

Problems
- Current matched filtering techniques are computationally expensive.
- Non-Gaussian noise limits the optimality of searches.
- Difficult to find the GW signals beyond the theoretical expectation. Un-modelled signals?

Background

Existing CNN-based approaches:
- Daniel George & E. A. Huerta (2018)
- Hunter Gabbard et al. (2018)
- X. Li et al. (2018)
- Timothy D. Gebhard et al. (2019)

Solution：

Machine Learning / Deep Learning

ABC of ML

Map / Algorithm

Input

Output

A number

A sequence

Yes or No

\{a_1, a_2, \dots, a_n\}

\{0 \text{ or } 1\}

\mathbf{x}

\mathbf{y}

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Our model / network

Past attempts

Initial model / network

Classification

Convolutional neural network (ConvNet or CNN)

Feature extraction

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

Visualization for the high-dimensional feature maps of learned network in layers for bi-class using t-SNE.

A glimpse of model Interpretability and Visualization

The Influence of hyperparameters?

Effect of the number of the convolutional layers on signal recognizing accuracy.

Marginal!

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

A glimpse of model Interpretability using Visualizing

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Past attempts

A glimpse of model Interpretability using Visualizing

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

\mathbf{y} = f(\mathbf{w}\cdot\mathbf{x}+\mathbf{b})

Occlusion Sensitivity

Peak of GW!

Past attempts

A glimpse of model Interpretability using Visualizing

Occlusion Sensitivity

Peak of GW!

Past attempts

However, when on real noises from LIGO, this approach does not work that well. (too sensitive + hard to find the events)

A specific design of the architecture is needed.

[as Timothy D. Gebhard et al. (2019)]

He Wang, Zhoujian Cao, et al. "Gravitational wave signal recognition of O1 data by deep learning”. e-Print: arXiv:1909.13442 [gr-qc]

Motivation
Matched-filtering in time domain
Matched-filtering ConvNet

MF-ConvNet Model

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

Motivation

MF-ConvNet Model

Motivation

Is it matched-filtering?

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

MF-ConvNet Model

Frequency domain

Matched-filtering in time domain

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

(whitening)

where

Time domain

Frequency domain

(normalizing)

(matched-filtering)

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

（A schematic illustration for a unit of convolution layer)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

(whitening)

where

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Time domain

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

FYI: \(N_\ast = \lfloor(N-K+2P)/S\rfloor+1\)

In the 1-D convolution (\(*\)), given input data with shape [batch size, channel, length] :

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

Wrapping (like the pooling layer)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

MF-ConvNet Model

Matched-filtering in time domain

(whitening)

where

(normalizing)

(matched-filtering)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Time domain

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

MF-ConvNet Model

\(\bar{S_n}(t)\)

\rho_m[1,C,1] = \max{\rho[1,C,N]}

\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}

Architechture

MF-ConvNet Model

Architechture

In the meanwhile, we can obtain the optimal time \(N_0\) (relative to the input) of feature response of matching by recording the location of the maxima value corresponding to the optimal template \(C_0\)

\(\bar{S_n}(t)\)

C_0 = \mathop{\arg\max}_{C}\rho[1,C,N] \,,\\ N_0 = \mathop{\arg\max}_{N} U[1,C_0,N]

\rho_m[1,C,1] = \max{\rho[1,C,N]}

\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}

Experiments & Results

Dataset & Templates
Training Strategy
Search methodology
Recovering GW Events
Population property on O1

He Wang, Zhoujian Cao, et al. "Gravitational wave signal recognition of O1 data by deep learning”. e-Print: arXiv:1909.13442 [gr-qc]

62.50M⊙ + 57.50M⊙ (\(\rho_{amp}=0.5\))

The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.

(In preprint)

FYI: sampling rate = 4096Hz

We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.

	template	waveform (train/test)
Number	35	1610
Length (s)	1	5
	equal mass

Dataset & Templates

Experiments & Results

FYI: sampling rate = 4096Hz

The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.

We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.

	template	waveform (train/test)
Number	35	1610
Length (s)	1	5
	equal mass

(In preprint)

Dataset & Templates

Experiments & Results

(In preprint)

Experiments & Results

Probability

(sigmoid function)

GPUs: 4 NVIDIA GeForce GTX 1080Ti
MXNet: A Scalable Deep Learning Framework

Tukey window for both data and templates before input the network.
Xavier initialization [X Glorot & Y Bengio (2010)]
Binary softmax cross-entropy loss
Optimizer: Adam [Diederik P. Kingma & Jimmy Ba (2014)]
Learning rate: 0.003
Batch size: 16 x 4
Curriculum learning: decreasing the signal data with SNR \(\rho_{amp}\) distributed at 1, 0.1, 0.03 and 0.02.

\rho_{amp} = \frac{\max_t h}{\sqrt{\sigma_{noise}}}

Training Strategy

Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
The model can scan the whole range of the input segment and output a probability score.
In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

(In preprint)

Experiments & Results

Search methodology

Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
The model can scan the whole range of the input segment and output a probability score.
In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

Input

(In preprint)

Experiments & Results

Search methodology

Recovering all GW events in both O1 and O2

(In preprint)

Experiments & Results

Recovering all GW events in both O1 and O2

(In preprint)

Experiments & Results

Number of Adjacent prediction

Statistical significance on O1
- Count a group of adjacent predictions as one "trigger block".
- For pure background (non-Gaussian), monotone trend should be observed.
- In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

Population property on O1

Experiments & Results

True Positive Rate

False Alarm Rate

Sensitivity estimation (ROC)
- Background: using time-shifting on the closed set from real LIGO recordings in O1
- Injection: random simulated waveforms

(In preprint)

a bump at 5 adjacent predictions

Number of Adjacent prediction

Statistical significance on O1
- Count a group of adjacent predictions as one "trigger block".
- For pure background (non-Gaussian), monotone trend should be observed.
- In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent predictions for it with respect to a threshold.

Sensitivity estimation (ROC)
- Background: using time-shifting on the closed set from real LIGO recordings in O1
- Injection: random simulated waveforms

Population property on O1

Experiments & Results

False Alarm Rate

True Positive Rate

(In preprint)

Some benefits from MF-CNN architechure
- Simple configuration for GW data generation
- Almost no data pre-processing
- Works on non-stationary background
- Easy parallel deployments, multiple detectors can be benefit a lot from this design
- More templates / smaller steps for searching can improve further

Summary

Some benefits from MF-CNN architechure
- Simple configuration for GW data generation
- Almost no data pre-processing
- Works on non-stationary background
- Easy parallel deployments, multiple detectors can be benefit a lot from this design
- More templates / smaller steps for searching can improve further
Main understanding of the algorithms:
- GW templates are used as likely features for matching
- Generalization of both matched-filtering and neural networks
- Matched-filtering can be rewritten as convolutional neural layers

Summary

Thank you for your attention!

Slide_UCAS

By He Wang

Slide_UCAS

https://gdlab.ucas.ac.cn/index.php/zh-CN/xsbg-2/2907-2020-01-08-00-47-20 (Jan 10th, 2020)

5 years ago
1,040

He Wang PRO

Knowledge increases by sharing but not by saving.

Gravitational-Wave Data Analysis

via Deep Learning

Outline

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

GW Detection in Stimulated Background Noise

Background

Background

ABC of ML

Past attempts

Past attempts

Past attempts

Past attempts

Past attempts

Past attempts

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Summary

Summary

Slide_UCAS

More from He Wang