Deep Learning Networks & Gravitational Wave Signal Recognization

He Wang (王赫)

[hewang@mail.bnu.edu.cn]

Department of Physics, Beijing Normal University

In collaboration with Zhou-Jian Cao

July 16th, 2019

Topological data analysis and deep learning: theory and signal applications - Part 4 ICIAM 2019

Outline

Introduction
- Background
- Related works
ConvNet Model
- Our past attempts
MF-ConvNet Model
- Motivation
- Matched-filtering in time domain
- Matched-filtering ConvNet (MF-CNN)
Experiments & Results
- Dataset & Training details
- Recovering GW Events
- Population property on O1
Summary

Introduction

Problems
- Current matched filtering techniques are computationally expensive.
- Non-Gaussian noise limits the optimality of searches.
- Un-modelled signals?

A trigger generator \(\rightarrow\) Efficiency+ Completeness + Informative

Background

Solution:
- Machine learning (deep learning)
- ...

Introduction

Existing CNN-based approaches:
- Daniel George & E. A. Huerta (2018)
- Hunter Gabbard et al. (2018)
- X. Li et al. (2018)
- Timothy D. Gebhard et al. (2019)

Related works

Introduction

Our main contributions:
- A brand new CNN-based architecture (MF-CNN)
- Efficient training process (no bandpass, etc.)
- Effective search methodology (4~5 days on O1)
- Fully recognized and predicted precisely (<1s) for all GW events in O1/O2

ConvNet Model

Our past attempts

Past attempts on stimulated noise

The Influence of hyperparameters?

Convolutional neural network (ConvNet or CNN)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

ConvNet Model

The Influence of hyperparameters?

Feature extraction

Merge part

A glimpse of model Interpretability and Visualization

Convolutional neural network (ConvNet or CNN)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

Classification

Visualization for the high-dimensional feature maps of learned network in layers for bi-class using t-SNE.

Past attempts on stimulated noise

ConvNet Model

The Influence of hyperparameters?

Marginal!

Feature extraction

Convolutional neural network (ConvNet or CNN)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

Classification

Effect of the number of the convolutional layers on signal recognizing accuracy.

Past attempts on stimulated noise

ConvNet Model

The Influence of hyperparameters?

Feature extraction

Convolutional neural network (ConvNet or CNN)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

Classification

Marginal!

Visualization of the top activation on average at the \(n\)th layer projected back to time domain using the deconvolutional network approach

A glimpse of model Interpretability using Visualizing

Past attempts on stimulated noise

ConvNet Model

The Influence of hyperparameters?

Feature extraction

Peak of GW

Convolutional neural network (ConvNet or CNN)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

Classification

Marginal!

Occlusion Sensitivity

A glimpse of model Interpretability using Visualizing

Past attempts on stimulated noise

ConvNet Model

The Influence of hyperparameters?

Feature extraction

Convolutional neural network (ConvNet or CNN)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

A specific design of the architecture is needed.

[as Timothy D. Gebhard et al. (2019)]

Classification

Marginal!

However, when on real noises from LIGO, this approach does not work that well. (too sensitive + hard to find the events)

Peak of GW

A glimpse of model Interpretability using Visualizing

Past attempts on stimulated noise

ConvNet Model

MF-ConvNet Model

Motivation
Matched-filtering in time domain
Matched-filtering ConvNet

(In preprint)

Motivation

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

MF-ConvNet Model

Matched-filtering (cross-correlation with the templates) can be regarded as a convolutional layer with a set of predefined kernels.

Is it matched-filtering?

Motivation

MF-ConvNet Model

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

Matched-filtering in time domain

Frequency domain

MF-ConvNet Model

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

(matched-filtering)

(normalizing)

Frequency domain

Time domain

where

(whitening)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

Matched-filtering in time domain

MF-ConvNet Model

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]

In the 1-D convolution (\(*\)), given input data with shape [batch size, channel, length] :

FYI: \(N_\ast = \lfloor(N-K+2P)/S\rfloor+1\)

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

Time domain

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

(matched-filtering)

(normalizing)

where

(whitening)

Matched-filtering in time domain

MF-ConvNet Model

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

（A schematic illustration for a unit of convolution layer)

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

Time domain

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

(matched-filtering)

(normalizing)

where

(whitening)

Matched-filtering in time domain

MF-ConvNet Model

The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

Wrapping (like the pooling layer)

Architechture

\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}

\rho_m[1,C,1] = \max{\rho[1,C,N]}

\(\bar{S_n}(t)\)

MF-ConvNet Model

\rho[1,C,N] = \frac{U[1,C,N]}{\sqrt{\sigma[1,C,0]\cdot fs}}

\rho_m[1,C,1] = \max{\rho[1,C,N]}

C_0 = \mathop{\arg\max}_{C}\rho[1,C,N] \,,\\ N_0 = \mathop{\arg\max}_{N} U[1,C_0,N]

\(\bar{S_n}(t)\)

In the meanwhile, we can obtain the optimal time \(N_0\) (relative to the input) of feature response of matching by recording the location of the maxima value corresponding to the optimal template \(C_0\)

Architechture

MF-ConvNet Model

Experiments & Results

Dataset & Templates
Training Strategy
Search methodology
Recovering GW Events
Population property on O1

Experiments & Results

Dataset & Templates

	template	waveform (train/test)
Number	35	1610
Length (s)	1	5
	equal mass

We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.

FYI: sampling rate = 4096Hz

(In preprint)

The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.

62.50M⊙ + 57.50M⊙ (\(\rho_{amp}=0.5\))

Experiments & Results

Dataset & Templates

(In preprint)

	template	waveform (train/test)
Number	35	1610
Length (s)	1	5
	equal mass

We use SEOBNRE model [Cao et al. (2017)] to generate waveform, we only consider circular, spinless binary black holes.

The background noises for training/testing are sampled from a closed set (33*4096s) in the first observation run (O1) in the absence of the segments (4096s) containing the first 3 GW events.

FYI: sampling rate = 4096Hz

Training Strategy

\rho_{amp} = \frac{\max_t h}{\sqrt{\sigma_{noise}}}

Tukey window for both data and templates before input the network.
Xavier initialization [X Glorot & Y Bengio (2010)]
Binary softmax cross-entropy loss
Optimizer: Adam [Diederik P. Kingma & Jimmy Ba (2014)]
Learning rate: 0.003
Batch size: 16 x 4
Curriculum learning: decreasing the signal data with SNR \(\rho_{amp}\) distributed at 1, 0.1, 0.03 and 0.02.

GPUs: 4 NVIDIA GeForce GTX 1080Ti
MXNet: A Scalable Deep Learning Framework

Probability

(sigmoid function)

Experiments & Results

(In preprint)

Search methodology

Experiments & Results

(In preprint)

Every 5 seconds segment as input of our MF-CNN with a step size of 1 second.
In the ideal case, with a GW signal hiding in somewhere, there should be 5 adjacent prediction for it with respect to a threshold.

Experiments & Results

(In preprint)

Recovering three GW events in O1

Experiments & Results

(In preprint)

Recovering three GW events in O1

Experiments & Results

(In preprint)

Recovering all GW events in O2

Experiments & Results

(In preprint)

Recovering all GW events in O2

Experiments & Results

(In progress)

Population property on O1

Sensitivity estimation
- Background: using time-slides on the closed set from real LIGO recordings
- Injection: random simulated waveform

Detection ratio

Experiments & Results

Population property on O1

(In progress)

Statistical significance on O1
- Count a group of adjacent predictions as one "trigger block"
- For pure background (non-Gaussian), monotone trend should be observed.

Experiments & Results

Population property on O1

(In progress)

Statistical significance on O1

Experiments & Results

Population property on O1

(In progress)

Statistical significance on O1

Interesting!

Summary

Some benefits from MF-CNN architechure
- Simple configuration for GW data generation
- Almost no data pre-processing
- Works on non-stationary background
- Fast, high acc. on all the events
- Easy parallel deployments, multiple detectors can be benefit a lot from this design
- More templates / smaller steps for searching can improve further
- Rethinking the deep learning neural networks?

Summary

Thank you for your attention!

Some benefits from MF-CNN architechure
- Simple configuration for GW data generation
- Almost no data pre-processing
- Works on non-stationary background
- Fast, high acc. on all the events
- Easy parallel deployments, multiple detectors can be benefit a lot from this design
- More templates / smaller steps for searching can improve further
- Rethinking the deep learning neural networks?

Slide_ICIAM2019

By He Wang

Slide_ICIAM2019

PDF version for ICIAM 2019 (July 16th, 2019)

6 years ago
987

He Wang PRO

Knowledge increases by sharing but not by saving.

Deep Learning Networks & Gravitational Wave Signal Recognization

Outline

Introduction

Introduction

Introduction

ConvNet Model

ConvNet Model

ConvNet Model

ConvNet Model

ConvNet Model

ConvNet Model

ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

MF-ConvNet Model

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Experiments & Results

Summary

Summary

Slide_ICIAM2019

More from He Wang