Fundamentals of Machine Learning for Gravitational Wave Search

Lecturer：He Wang (王赫)

2025/11/13 @FQCP2025

ICTP-AP, UCAS

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

# Who am I

Who Am I

He Wang received his Ph.D. in Theoretical Physics from Beijing Normal University in 2020. He is currently an Associate Researcher (E-Series) at the ICTP-AP, UCAS. After completing his Ph.D., he conducted postdoctoral research at the ITP-CAS, the Peng Cheng National Laboratory (as a visiting scholar), and UCAS.

He serves as the Co-chair of the LVK Machine Learning Algorithms Group, a Core Member of the LISA Consortium, and a Youth Data Scientist at the National Astronomical Data Center (NADC). As a core contributor to China’s Taiji Program for Space Gravitational Wave Detection, his work focuses on scientific data analysis and algorithmic development.

Teaching

Jan 2024
- Gravitational Wave Data Exploration: A Practical Training in Programming and Analysis @ICTP-AP, UCAS
Aug 2023
- Machine learning and GW data analysis @TianQin Center

Selected Works

HW, Liang Zeng. "Automated Algorithmic Discovery for Gravitational-Wave Detection via LLM-Informed Evolutionary Monte Carlo Tree Search". e-Print: arXiv:2508.03661 [cs.AI]
HW, et al. “WaveFormer:Transformer-Based Denoising Method for Gravitational- Wave Data.” Mach. learn.: sci. technol. 5, no. 1 (March 2024): 015046. e-Print: arXiv:2212.14283 [gr-qc]
HW, et al. "Sampling with prior knowledgefor high-dimensional gravitational wave data analysis." Big Data Mining and Analytics 5.1 (2021):53-63.
HW, et al. "Gravitational-wave signalrecognition of LIGO data by deep learning”. PRD 101 (2020) 10, 104003, e-Print: arXiv:1909.13442 [gr-qc]

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

# GW: ML

AI > Machine Learning > Deep Learning

Machine Learning
- A major branch of Artificial Intelligence (AI) focused on improving algorithmic performance through learning from experience.
- Typical models include Linear Regression, Decision Trees, Support Vector Machines (SVMs), and Markov Chain Monte Carlo (MCMC) methods.
Deep Learning
- A specialized subfield of machine learning that uses neural networks to automatically extract features from data.
- Deep neural networks serve as universal function approximators, capable of modeling complex nonlinear mappings.
- Key characteristics: end-to-end learning, data-driven, and over-parameterized architectures.
Data-driven approaches: discovering patterns and regularities from data through algorithms and applying them to new data.

Knowledge Discovery in Database, KDD

What Is Machine Learning?

“机器学习是对能通过经验自动改进的计算机算法的研究。”
Machine Learning is the study of computer algorithms that improve automatically through experience.
“机器学习是用数据或以往的经验，以此优化计算机程序的性能标准。”
Machine learning is programming computers to optimize a performance criterion using example data or past experience.
——Alpaydin (2004)
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. ——Tom Mitchell (1997)

# GW: ML

Humans make judgments based on experience —
Machines make judgments by training models on data.

Is it a cat?

Is it a spam?

Is it a sweet
strawberry?

Machine learning

Human learning

experience

data

train

input

new data

new problem

predict

unknown property

future

Goal of Machine Learning

# GW: ML

Task [T]: Determine whether a strawberry is sweet.
- Machine learning aims to find the mapping between a strawberry’s features (size, color, ripeness, etc.) and its label (sweet or sour).

Feature Dimensions

Strawberry 1

Strawberry 2

Possible Feature Values

Label

Size

Color

Ripeness

Ripe

Half-ripe

Red

Pink

Small

Large

Taste

Sweet

Sour

The Machine Learning Process

# GW: ML

Machine learning aims to discover the relationship between features and labels.

It uses algorithms to automatically analyze a set of training data, learn underlying patterns, and apply them to predict unseen data.
This process of finding patterns and relationships is called training, and the outcome of training is a machine learning model.

Feature Dimensions

Strawberry 1

Strawberry n

Label

Size

Color

Ripeness

ML model

train

Machine Learning

Taste

Training dataset

Common Types of Machine Learning

# GW: ML

Machine learning models can be broadly categorized based on the presence of labels in training data and how they interact with their environment:

Supervised Learning — learning from labeled data
Unsupervised Learning — discovering structure in unlabeled data
Reinforcement Learning — learning through interaction and feedback from the environment

Common Types of Machine Learning: SL

# GW: ML

Supervised learning (SL) teaches machines with explicit guidance — the key is that training data are labeled with known outputs (labels).
The goal is for the model, after observing labeled training examples (inputs and expected outputs), to predict the correct output for unseen inputs.
To achieve this, the model must generalize from the observed data in a meaningful way — a process similar to how humans and animals learn concepts from examples, known as concept learning in cognitive science.

Some of these are strawberries.

The child learns to recognize what a strawberry looks like.

Concept Learning

Images are labeled with “strawberry.”

The machine trains a model that can recognize strawberries.

Supervised Learning

Supervised Learning vs Matched Filtering

# GW: ML

Supervised learning: the key is labeled training data.

Matched filtering (template-based GW search)

Given a segment of time-series data as input, the detection statistic (the matched-filter signal-to-noise ratio over time) is an output time series.
The core question: Which linear filter (i.e., which template) maximizes that output?
In practice, matched filtering correlates the data with a template waveform and is the optimal linear detector for signals buried in stationary Gaussian noise — it produces the maximum SNR for a given template.

# GW: ML

Unsupervised learning (uSL) is a learning process without guidance, where the training data to be learned has no labels.
Machine learning algorithms identify common characteristics in the data through certain methods and group data with shared features together. This process is sometimes referred to as "clustering."
Clustering involves statistically classifying similar objects into different groups or more subsets so that member objects within the same subset share similar attributes.
Unsupervised learning algorithms freely explore the data, and much of what is learned must involve understanding the data itself, rather than applying this understanding to specific tasks. Therefore, mastering unsupervised learning is essential on the path to general intelligence.
The process of unsupervised learning is similar to the human process of inductive learning.

Common Types of Machine Learning: uSL

Unsupervised Learning

Induction

Elephant

Tiger

Lion

# GW: ML

Unsupervised learning (uSL) is a learning process without guidance, where the training data to be learned has no labels.
Machine learning algorithms identify common characteristics in the data through certain methods and group data with shared features together. This process is sometimes referred to as "clustering."
Clustering involves statistically classifying similar objects into different groups or more subsets so that member objects within the same subset share similar attributes.
Unsupervised learning algorithms freely explore the data, and much of what is learned must involve understanding the data itself, rather than applying this understanding to specific tasks. Therefore, mastering unsupervised learning is essential on the path to general intelligence.
The process of unsupervised learning is similar to the human process of inductive learning.

Common Types of Machine Learning: uSL

DOI:10.1016/j.ins.2018.02.068

Other Types of Machine Learning

# GW: ML

Semi-Supervised Learning (半监督学习)
Self-Supervised Learning (自监督学习)
...

2002.08721

Classification of Machine Learning Models

# GW: ML

Prediction Based on Supervised Learning

Classification Tasks
(Predicting Different Categories)

Regression Problems (Predicting Continuous Values)

RL

Unsupervised Learning: Extracting Patterns from Unlabeled Data

Use clustering to discover subgroups.

Dimensionality Reduction

Based on the features extracted from data samples, determine which of a finite number of categories they belong to.

Based on the features extracted from data samples, predict continuous value outcomes.

Based on the features extracted from data samples, mine association patterns in the data.

Discover hidden patterns and structures in the data.

# GW: ML

TSNE

UMAP

Based on labels

Classification of Machine Learning Models

# GW: ML

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

The blue circle contains the judgment criteria, and the green box contains the selectable algorithms. You can find your own operational path based on your data characteristics and task objectives, and just take it step by step.

Classification of Machine Learning Models

# GW: ML

https://medium.com/@chris_bour/an-extended-version-of-the-scikit-learn-cheat-sheet-5f46efc6cbb

Classification of Machine Learning Models

# GW: ML

x

y

y=mx+b

Conditional Probability \(P(Y|X)\) follows a Gaussian distribution

Linear Regression

Classification by Data Distribution: Parametric vs. Non-Parametric Models
- Here, “parametric” does not refer to the parameters within a model, but rather to the parameters of the data distribution itself.
Parametric Models:
- Assume a specific form for the data distribution
- The underlying data patterns or mappings can be described using a finite and fixed set of model parameters.

Examples: Linear/Logistic Regression, Perceptron, K-Means Clustering

Advantages: Simple, fast, and requires less data
Limitations: Fixed functional form, limited complexity, prone to underfitting

Classification of Machine Learning Models

# GW: ML

x

y

y=mx+b

Conditional Probability \(P(Y|X)\) follows a Gaussian distribution

Linear Regression

Classification by Data Distribution: Parametric vs. Non-Parametric Models
- Here, “parametric” does not refer to the parameters within a model, but rather to the parameters of the data distribution itself.
Parametric Models:
- Assume a specific form for the data distribution
- The underlying data patterns or mappings can be described using a finite and fixed set of model parameters.

Examples: Linear/Logistic Regression, Perceptron, K-Means Clustering

Advantages: Simple, fast, and requires less data
Limitations: Fixed functional form, limited complexity, prone to underfitting

Note: In some cases, the data may not provide enough information to assume a prior distribution, or the problem itself may not exhibit any clear distributional characteristics.

Classification of Machine Learning Models

# GW: ML

Non-Parametric Models:
- Make no assumptions about the form of the data distribution; all statistical properties are derived directly from the data.
- Typically have much higher spatial and temporal complexity than parametric models.
- Are data-adaptive — the model parameters change dynamically with the samples.

x

y

y=mx+b

Conditional Probability \(P(Y|X)\) follows a Gaussian distribution

Linear Regression

K-Nearest Neighbors

Classification by Data Distribution: Parametric vs. Non-Parametric Models
- Here, “parametric” does not refer to the parameters within a model, but rather to the parameters of the data distribution itself.
Parametric Models:
- Assume a specific form for the data distribution
- The underlying data patterns or mappings can be described using a finite and fixed set of model parameters.

Examples: Linear/Logistic Regression, Perceptron, K-Means Clustering

Advantages: Simple, fast, and requires less data
Limitations: Fixed functional form, limited complexity, prone to underfitting

Examples: Random Forest, Naive Bayes, SVM, Neural Networks

Advantages: Flexible functional forms, fewer strong assumptions, good fitting ability
Limitations: Require large datasets, slower computation, prone to overfitting, lower interpretability

Note: In some cases, the data may not provide enough information to assume a prior distribution, or the problem itself may not exhibit any clear distributional characteristics.

Classification of Machine Learning Models

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

The Origins of Deep Learning

# GW: DL

Machine Learning: A key branch of artificial intelligence and an interdisciplinary field
Data-Driven: Discovering patterns and regularities from data through algorithms and applying them to new data

Knowledge Discovery in Database, KDD

# GW: DL

Milestone Events in the Development of Artificial Intelligence

The Origins of Deep Learning

# GW: DL

Milestone Events in the Development of Artificial Intelligence

The Origins of Deep Learning

Credit: https://medium.com/@lmpo/a-brief-history-of-ai-with-deep-learning-26f7948bc87b

# GW: DL

Milestone Events in the Development of Artificial Intelligence

The Origins of Deep Learning

Credit: https://medium.com/@lmpo/a-brief-history-of-ai-with-deep-learning-26f7948bc87b

# GW: DL

Milestone Events in the Development of Artificial Intelligence

The Origins of Deep Learning

Credit: https://medium.com/@lmpo/a-brief-history-of-ai-with-deep-learning-26f7948bc87b

SVM (support vector machines)

# GW: DL

Milestone Events in the Development of Artificial Intelligence

The Origins of Deep Learning

Credit: https://medium.com/@lmpo/a-brief-history-of-ai-with-deep-learning-26f7948bc87b

Rajat Raina & Andrew Y. Ng. (ICML09)

(~1970)

Jen-Hsun Huang.

GPU for DL (~2010)

Fei-Fei Li (ILSVRC2010)

# GW: DL

Milestone Events in the Development of Artificial Intelligence

The Origins of Deep Learning

Credit: https://medium.com/@lmpo/a-brief-history-of-ai-with-deep-learning-26f7948bc87b

Alex Krizhevsky

Ilya Sutskever

Geoffrey Hinton

Deep learning
dramatically reduced error rates.

Error rate below human level

The Core of Deep Learning

# GW: DL

Three Driving Forces of Deep Learning:

Big Data (Massive scale)

Algorithms
(Neural Networks)

Computing Power (GPU Hardware)

Artificial Intelligence

The Heros of Deep Learning

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning.” Nature 521, no. 7553 (May 1, 2015): 436–44. https://doi.org/10.1038/nature14539.

Followers

Turing Award

Geoffrey Hinton

Yoshua Bengio

Yann LeCun

Jen-Hsun Huang

Fei-Fei Li

Bill Dally

The Queen Elizabeth Prize for Engineering
(5th Nov, 2025)

Credit: https://www.ft.com/content/5f2f411c-3600-483b-bee8-4f06473ecdc0

quit?

resigned

Nobel Prize in Physics (2024)

LawZero

+

=

Characteristics of Deep Learning

# GW: DL

Traditional Machine Learning vs. Deep Learning
Traditional Machine Learning: Manually designed features
- In practice, feature design often matters more than the classifier itself.
- Preprocessing: Clean and normalize data, e.g., remove noise or stopwords in text classification.
- Feature Extraction: Derive meaningful features from raw data, e.g., edges or scale-invariant features in images.
- Feature Transformation: Modify features, e.g., dimensionality reduction or expansion.
  - Feature Selection: Mutual Information, TF-IDF
  - Feature Extraction: PCA, SVD, LDA
Deep Learning: An end-to-end learning paradigm
- Enables learning of complex nonlinear mappings.
- Shifts from manual knowledge encoding → learning from data
- From divide-and-conquer → holistic consideration
- From algorithm-focused → data-focused

# GW: DL

End-to-End Learning Paradigm:
Deep learning is an automated feature learning approach that can extract useful features directly from raw data. This end-to-end paradigm eliminates the need for manual feature design, enabling the model to learn all the knowledge required to solve a problem directly from the data, greatly enhancing learning capability and efficiency.

Characteristics of Deep Learning

Credit: https://yangxiaozhou.github.io/data/2020/09/24/intro-to-cnn.html

# GW: DL

End-to-End Learning Paradigm:
Deep learning is an automated feature learning approach that can extract useful features directly from raw data. This end-to-end paradigm eliminates the need for manual feature design, enabling the model to learn all the knowledge required to solve a problem directly from the data, greatly enhancing learning capability and efficiency.

Characteristics of Deep Learning

# GW: DL

https://playground.tensorflow.org

Essence of Deep Learning

# GW: DL

Essence: Deep learning uses multi-layer models and large-scale training data (including unlabeled data) to learn more useful features, ultimately improving classification or prediction accuracy.
The deep model is the means; feature learning is the goal.
Differences from Shallow Learning:
1. Emphasizes model depth, typically with 5–10+ hidden layers;
2. Highlights feature learning: through layer-by-layer transformations, raw features are mapped into new feature spaces, making classification or prediction easier. Compared to manually designed features, learning from large-scale data better captures the rich intrinsic information of the data.

深度学习技术的发展

# GW: DL

Geoffrey Hinton & Neural Networks
- 1970 年，当神经网络研究的第一个寒冬降临时，在英国的爱丁堡大学，一位 23 岁的年轻人 Geoffrey Hinton，刚刚获得心理学的学士学位。
- Hinton 六十年代还是中学生就对脑科学着迷。当时一个同学给他介绍关于大脑记忆的理论是：大脑对于事物和概念的记忆，不是存储在某个单一的地点，而是像全息照片一样，分布式地存在于一个巨大的神经元的网络里。
- 分布式表征（Distributed Rep.）和传统的局部表征（Localized Rep.）相比：
  - 存储效率高：线性增加的神经元数目，可以表达指数级增加的大量不同概念。
  - 鲁棒性好：即使局部出现硬件故障，信息的表达不会受到根本性的破坏。
- 这个理念让 Hinton 顿悟，使他 40 多年来一致在神经网络研究的领域内坚持。
  - 本科毕业后，Hinton 选择继续在爱丁堡大学读研，把人工智能作为自己的博士研究方向。
  - 1978 年，Hinton 在爱丁堡获得博士学位后，来到美国继续他的研究工作。

深度学习技术的发展

# GW: DL

D. Rumelhart & BP Algorithm
- 神经网络被 Minsky 诟病的问题：巨大的计算量；XOR 问题；
- 传统的感知器用所谓“梯度下降”的算法纠错时，耗费的计算量和神经元数目的平方成正比，当神经元数目增多，庞大的计算量是当时的硬件无法胜任的。
- 1982年，美国加州理工物理学家J.J.Hopfield提出了Hopfield神经网格模型，引入了“计算能量”概念，给出了网络稳定性判断。
- 1986 年 7 月，Hinton 和 David Rumelhart 合作在 Nature 杂志上发表论文：Learning Representations by Back-propagating Errors. 第一次系统简洁地阐述 BP 算法及其应用：
  - 反向传播算法把纠错的运算量下降到只和神经元数目本身成正比；
  - BP 算法通过在神经网络里增加一个所谓隐层（hidden layer），解决了 XOR 难题
  - 使用了 BP 算法的神经网络在做如形状识别之类的简单工作时，效率比感知器大大提高，八十年代末计算机的运行速度，也比二十年前高了几个数量级；
- 神经网络及其应用的研究开始复苏！

h_j=Sgn(\sum_{i=1}^{n}w_{ji}x_i-\theta_j) \\ y=Sgn(\sum_{j=1}^{m}w_jh_j-\theta)

深度学习技术的发展

# GW: DL

Yann Lecun (杨立昆) & CNN
- Yann Lecun 于 1960 年出生于巴黎。
- 1987 年在法国获得博士学位后，他曾追随 Hinton 教授到多伦多大学做了一年博士后的工作，随后搬到新泽西州的 Bell Lab 继续研究工作。
- 在 Bell Lab, Lecun 1989 年发表了论文，“反向传播算法在手写邮政编码上的作用”。他用美国邮政系统提供的近万个手写数字的样本来训练神经网络系统，训练好的系统在独立的测试样本中，错误率只有 5%。
- Lecun 进一步运用一种叫做“卷积神经网络”（Convolutional Neural Networks, CNN）的技术，开发出商业软件，用于读取银行支票上的手写数字，这个支票识别系统在九十年代末占据了美国接近 20% 的市场。

2003 年，Yann LeCun 等人在 NEC 实验室的使用CNN进行人脸检测。

深度学习技术的发展

# GW: DL

Yann Lecun (杨立昆) & CNN
- Yann Lecun 于 1960 年出生于巴黎。
- 1987 年在法国获得博士学位后，他曾追随 Hinton 教授到多伦多大学做了一年博士后的工作，随后搬到新泽西州的 Bell Lab 继续研究工作。
- 在 Bell Lab, Lecun 1989 年发表了论文，“反向传播算法在手写邮政编码上的作用”。他用美国邮政系统提供的近万个手写数字的样本来训练神经网络系统，训练好的系统在独立的测试样本中，错误率只有 5%。
- Lecun 进一步运用一种叫做“卷积神经网络”（Convolutional Neural Networks, CNN）的技术，开发出商业软件，用于读取银行支票上的手写数字，这个支票识别系统在九十年代末占据了美国接近 20% 的市场。
- 此时就在 Bell Lab，Yann Lecun 临近办公室的一个同事 Vladimir Vapnik 的工作，又把神经网络研究带入第二个寒冬！

在90年代，人工神经网络缺少严格的数学理论支撑，统计学习大发展。 Vapnik提出支持向量机(SVM)，改进了感知器的一些缺陷(例如创建灵活的特征而不是手编的非适应的特征)。它同样解决了线性不可分问题，但是对比神经网络有全方位优势:

高效，可以快速训练;
无需调参，没有梯度消失问题;
高效泛化，全局最优解，不存在过拟合问题。

SVM (support vector machines)

深度学习技术的发展

# GW: DL

Hinton & Deep Learning
- 2003年，Geoffrey Hinton 还在多伦多大学，在神经网络的领域苦苦坚守。
- 2003 年在温哥华大都会酒店，以 Hinton 为首的十五名来自各地的不同专业的科学家，和加拿大先进研究员（Canadian Institute of Advanced Research, CIFAR）的基金管理负责人 Melvin Silverman 交谈。
  - Silverman 问大家，为什么 CIFAR 要支持他们的研究项目。
  - 计算神经科学研究者，Sebastian Sung（现为普林斯顿大学教授）回答道：“喔，因为我们有点古怪。如果 CIFAR 要跳出自己的舒适区，寻找一个高风险，极具探索性的团体，就应当资助我们了！”
  - 最终 CIFAR 同意从 2004 年开始资助这个团体十年，总额一千万加元。CIFAR 成为当时世界上唯一支持神经网络研究的机构。

Hinton 拿到资金支持不久做的第一件事，就是把“神经网络”改名换姓为“深度学习”。
此后，Hinton 的同时不时会听到他突然在办公室大叫：“我知道人脑是如何工作的了！”
2006 年 Hinton 和合作者发表革命性的论文：A Fast Learning Algorithm for Deep Belief Nets .

逐层初始化（layer-wise pre-training）
预训练（pre-training）
微调（fine-tuning）

被 Hinton 首次定义为深度学习过程

深度学习技术的发展

# GW: DL

Andrew Y. Ng & GPU
- 2007 年之前，用 GPU 编程缺乏一个简单的软件接口，编程繁琐，Debug 困难。2007 年 NVIDIA 推出 CUDA 的 GPU 软件接口后才真正改善。
- 2009 年 6 月，斯坦福大学的 Rajat Raina 和吴恩达合作发表论文：Large-scale Deep Unsupervised Learning using Graphic Processors (ICML09)；论文采用 DBNs 模型和稀疏编码（Sparse Coding），模型参数达到一亿（与 Hinton 模型参数的对比见下表）。
- 结论结果显示：使用 GPU 运行速度和用传统双核 CPU 相比，最快时要快近 70 倍。在一个四层，一亿个参数的 DBN 网络上使用 GPU 把程序运行时间从几周降到一天。

深度学习技术的发展

# GW: DL

Jen-Hsun Huang & GPU
- 黄仁轩，1963 年出生于台湾。1993 年斯坦福大学硕士毕业后不久创立了 NVIDIA。
- NVIDIA 起家时做的是图像处理的芯片，主要面对电脑游戏市场。1999 年 NVIDIA 推销自己的 Geforce 256 芯片时，发明了 GPU（Graphics Processing Unit）这个名词。
- GPU 的主要任务，是要在最短时间内显示上百万、千万甚至更多的像素。这在电脑游戏中是最核心的需求。这个计算工作的核心特点，是要同时并行处理海量的数据。
- 传统的 CPU 芯片架构，关注点不在并行处理，一次只能同时做一两个加减法运算。而 GPU 在最底层的算术逻辑单元（ALU，Arithmetic Logic Unit），是基于所谓的 Single Instruction Multiple Data（单指令多数据流）的架构，擅长对于大批量数据并行处理。
- 一个 GPU，往往包含几百个 ALU，并行计算能力极高。所以尽管 GPU 内核的时钟速度往往比 CPU 的还要慢，但对大规模并行处理的计算工作，速度比 CPU 快许多。
- 神经网络的计算工作，本质上就是大量的矩阵计算的操作，因此特别适合于使用 GPU。

深度学习技术的发展

# GW: DL

Big Data：ImageNet
- 2009 年，一群普林斯顿大学计算机系的华人学者（李飞飞教授领衔）发表了论文：ImageNet：A large scale hierarchical image database，宣布建立了第一个超大型图像数据库供计算机视觉研究者使用。
- 数据库建立之初，包含了 320 万个图像。它的目的，是要把英文里的 8 万个名词，每个词收集到五百到一千个高清图片，存放到数据库里，最终达到五千万以上的图像。
- 2010 年，以 ImageNet 为基础的大型图像识别竞赛，ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010) 第一次举办。 [http://www.image-net.org/ ]
- 竞赛最初的规则：以数据库内 120 万个图像为训练样本，这些图像从属于一千多个不同的类别，都被手工标记。经过训练的程序，再用于 5 万个测试图像评估分类准确率。

深度学习技术的发展

# GW: DL

Image Classification: ILSVRC 竞赛
- 2010 年冠军：NEC 和伊利诺伊大学香槟分校的联合团队，用支持向量机（SVM）的技术识别分类的错误率 28%。
- 2011 年冠军：用 Fisher Vector 的计算方法（类似 SVM），将错误率降到了 25.7%。
- 2012 年冠军：Hinton 和两个 Alex Krizhevsky，Illya Sutskever，利用 CNN+Dropout 算法+RELU 激励函数，用了两个 NVIDIA 的 GTX580 GPU（内存 3GB，计算速度 1.6 TFLOPS），花了接近 6 天时间，错误率只有 15.3%。
  - 2012 年 10 月 13 日，当竞赛结果公布后，学术界沸腾了。这是神经网络二十多年来，第一次在图像识别领域，毫无疑义的，大幅度挫败了别的技术。
- 这是人工智能技术突破的一个重要转折点！

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

# GW

Gravitational waves (GW) are a strong field effect in General Relativity, ripples in the fabric of spacetime caused by accelerating massive objects.

Gravitational Wave Astronomy

Compact Binary Coalescences

LIGO-Virgo-KAGRA-...

Detecting gravitational waves require a mix of FIVE key ingredients:
1. good detector technology
2. good waveform predictions
3. good data analysis methodology and technology
4. coincident observations in several independent detectors
5. coincident observations in electromagnetic astronomy

—— Bernard F. Schutz

DOI: 10.1063/1.1629411

# GW

GW Data Characteristics

LIGO-VIRGO-KAGRA

LISA Project

Noise: non-Gaussian and non-stationary
Signal challenges:
- (Earth-based) A low signal-to-noise ratio (SNR) which is typically about 1/100 of the noise amplitude (-60 dB).
- (Space-based) A superposition of all GW signals (e.g.: 10⁴ of GBs, 10~10² of SMBHs, and 10~10³ of EMRIs, etc.) received during the mission's observational run.

Matched Filtering Techniques (匹配滤波方法)

In Gaussian and stationary noise environments, the optimal linear algorithm for extracting weak signals
Works by correlating a known signal model \(h(t)\) (template) with the data.
Starting with data: \(d(t) = h(t) + n(t)\).
Defining the matched-filtering SNR \(\rho(t)\):
\(\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2 \) , where
\(\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df \) ,
\(\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df \),
\(S_n(f)\) is noise power spectral density (one-sided).

Statistical Approaches

Frequentist Testing:

Make assumptions about signal and noise
Write down the likelihood function
Maximize parameters
Define detection statistic
→ recover MF

Bayesian Testing:

Start from same likelihood
Define parameter priors
Marginalize over parameters
Often treated as Frequentist statistic
→ recover MF (for certain priors)

Challenge and Methodology: Detecting Signals in GW Data

# GW

CNN for GW Detection: Pioneering Approaches

Core Insight from Computer Vision

Direct approach from Computer Vision (CV) to GW signal processing: pixel point \(\Rightarrow\) sampling point.
The CNN framework treats time series data similar to images, where each sampling point represents a feature to learn.

Performance Analysis

Convolutional neural networks (CNN) can achieve comparable performance to Matched Filtering under Gaussian stationary noise.
CNNs significantly outperform traditional methods in terms of execution speed (with GPU support).
Modern architectures show improved robustness against non-Gaussian noise transients (glitches).

Pioneering Research Publications

PRL, 2018, 120(14): 141103.

PRD, 2018, 97(4): 044039.

# GW

CNN for GW Detection: Pioneering Approaches

Deep convolutional neural network to search for binary black hole gravitational-wave signals.
Input is the whitened time series of measured gravitational- wave strain in Gaussian noise.
Sensitivity comparable to match filtering.

a rXiv: 1712.06041

A hands-on look at applying ML in real GW searches: 
https://github.com/iphysresearch/GWData-Bootcamp
=> 2023/deep_learning/baseline/baseline_2025FQCP.ipynb

# GW

CNN for GW Detection: Feature Extraction

Matched-filtering Convolutional Neural Network (MFCNN)

HW, SC Wu, ZJ CAO, et al. PRD 101, 10 (2020): 104003

Convolutional Neural Network (ConvNet or CNN)

feature extraction

classifier

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

>> Is it matched-filtering ?
>> Wait, It can be matched-filtering!

Matched-filtering (cross-correlation with templates) can be interpreted as a convolutional layer with predefined kernels.

GW150914

# GW

CNN for GW Detection: Feature Extraction

Transform matched-filtering method from frequency domain to time domain.
The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

where

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Deep Learning Framework

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

Time Domain

(matched-filtering)

(normalizing)

(whitening)

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

Frequency Domain

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

# GW

CNN for GW Detection: Feature Extraction

Transform matched-filtering method from frequency domain to time domain.
The square of matched-filtering SNR for a given data \(d(t) = n(t)+h(t)\):

\langle h|h \rangle \sim [\bar{h}(t) \ast \bar{h}(-t)]|_{t=0}

\langle d|h \rangle (t) \sim \,\bar{d}(t)\ast\bar{h}(-t)

\(S_n(|f|)\) is the one-sided average PSD of \(d(t)\)

where

\bar{S_n}(t)=\int^{+\infty}_{-\infty}S_n^{-1/2}(f)e^{2\pi ift}df

\left\{\begin{matrix} \bar{d}(t) = d(t) * \bar{S}_n(t) \\ \bar{h}(t) = h(t) * \bar{S}_n(t) \end{matrix}\right.

Deep Learning Framework

\rho^2(t)\equiv\frac{1}{\langle h|h \rangle}|\langle d|h \rangle(t)|^2

Time Domain

(matched-filtering)

(normalizing)

(whitening)

\langle h|h \rangle = 4\int^\infty_0\frac{\tilde{h}(f)\tilde{h}^*(f)}{S_n(f)}df

\langle d|h \rangle (t) = 4\int^\infty_0\frac{\tilde{d}(f)\tilde{h}^*(f)}{S_n(f)}e^{2\pi ift}df

Frequency Domain

\int\tilde{x}_1(f) \cdot \tilde{x}_2(f) e^{2\pi ift}df= x_1(t)*x_2(t)

\int\tilde{x}_1(f) \cdot \tilde{x}^*_2(f) e^{2\pi ift}df= x_1(t)\star x_2(t)

x_1(t)*x_2^*(-t) = x_1(t)\star x_2(t)

In the 1-D convolution (\(*\)) on Apache MXNet, given input data with shape [batch size, channel, length] :

output[n, i, :] = \sum^{channel}_{j=0} input[n,j,:] \ast weight[i,j,:]

FYI: \(N_\ast = \lfloor(N-K+2P)/S\rfloor+1\)

（A schematic illustration for a unit of convolution layer)

# GW

CNN for GW Detection: Feature Extraction

import mxnet as mx
from mxnet import nd, gluon
from loguru import logger

def MFCNN(fs, T, C, ctx, template_block, margin, learning_rate=0.003):
    logger.success('Loading MFCNN network!')
    net = gluon.nn.Sequential()         
    with net.name_scope():
        net.add(MatchedFilteringLayer(mod=fs*T, fs=fs,
                                      template_H1=template_block[:,:1],
                                      template_L1=template_block[:,-1:]))
        net.add(CutHybridLayer(margin = margin))
        net.add(Conv2D(channels=16, kernel_size=(1, 3), activation='relu'))
        net.add(MaxPool2D(pool_size=(1, 4), strides=2))
        net.add(Conv2D(channels=32, kernel_size=(1, 3), activation='relu'))    
        net.add(MaxPool2D(pool_size=(1, 4), strides=2))
        net.add(Flatten())
        net.add(Dense(32))
        net.add(Activation('relu'))
        net.add(Dense(2))
	# Initialize parameters of all layers
    net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx, force_reinit=True)
    return net

1 sec duration

35 templates used

Explainable AI Approach

Implements matched filtering operations through custom convolutional layers
Makes the network more interpretable by embedding domain knowledge
Connects traditional signal processing with deep learning
Outperforms standard CNNs in both accuracy and efficiency

Matched-filtering Convolutional Neural Network (MFCNN)

The available codes (2019): https://gist.github.com/iphysresearch/a00009c1eede565090dbd29b18ae982c

HW, SC Wu, ZJ CAO, et al. PRD 101, 10 (2020): 104003

# GW

First Benchmark for GW Detection Algorithms

Gravitational wave signal search algorithm benchmark (MLGWSC-1)
Dataset-4: Sampled from O3a real gravitational wave observation data

Benchmark Results

Publications

Key Findings

On simulated noise data, machine learning algorithms are highly competitive compared to LIGO's most sensitive signal search pipelines
Most tested machine learning algorithms are overly sensitive to non-Gaussian real noise backgrounds, resulting in high false alarm rates

Traditional signal search algorithms can identify gravitational wave signals at low false alarm rates with assured confidence
Tested machine learning algorithms have very limited ability to identify long-duration signals

Note on Benchmark Limitations:

Outperforming PyCBC doesn't conclusively prove that matched filtering is inferior to AI methods. This is both because the dataset represents a specific distribution and because PyCBC settings could be further optimized for this particular benchmark.

arXiv:2501.13846 [gr-qc]

Phys. Rev. D 110, 024024 (2024)

Phys. Rev. D 107, 023021 (2023)

# GW

Interpretability Challenges:
Comparing Detection Statistics

Challenges in Model Interpretability:
- The black-box nature of AI models complicates interpretability, challenging the comparison of AI-generated detection statistics with traditional matched filtering chi-square distributions.
- Convincing the scientific community of the pipeline's validity and the statistical significance of new discoveries remains difficult despite the model's ability to identify potential gravitational wave signals.

AI Model Denoising

Our Model's Detection Statistics

LVK Official Detection Statistics

Signal denoising visualization using our deep learning model (Transformer-based)

Detection statistics from our AI model showing O1 events

HW et al 2024 MLST 5 015046

GW151226

GW151012

Official detection statistics from LVK collaboration

LVK. PRD (2016). arXiv:1602.03839

# GW

Exploring Beyond General Relativity

B. P. Abbott et al. (LIGO-Virgo), PRD 100, 104036 (2019).

Much of the discussion on model generalization has been within the GR framework.
Our work on beyond General Relativity (bGR) aims to demonstrate AI's potential advantages in detecting signals that surpass GR's limitations.

\begin{aligned} \psi & \sim \frac{3}{128 \eta}(\pi f M)^{-5 / 3} \sum_{i=0}^n \textcolor{red}{\varphi_i^{\mathrm{GR}}}(\pi f M)^{i / 3} \\ \varphi_i & \rightarrow\left(1+\delta \varphi_i\right) \textcolor{red}{\varphi_i^{\mathrm{GR}}} \end{aligned}

Yu-Xin Wang, Xiaotong Wei, Chun-Yue Li, Tian-Yang Sun, Shang-Jie Jin, He Wang*, Jing-Lei Cui, Jing-Fei Zhang, and Xin Zhang*. “Search for Exotic Gravitational Wave Signals beyond General Relativity Using Deep Learning.” PRD 112 (2), 024030. e-Print: arXiv:2410.20129 [grqc]

# GW

Interpretability Challenges: Discoveries vs. Validation (part 1/2)

arXiv:2407.07820 [gr-qc]

Recent AI Discoveries & Validation Hurdles:

A recent study (arXiv:2407.07820) demonstrates how a ResNet-based (CNN) architecture with careful signal search strategy and post-processing can identify 8 new potential gravitational wave events from LIGO O3 data.
The absence of these events in traditional PyCBC results raises questions: could adjustments to rate priors and p_astro parameters in signal models help traditional pipelines detect these candidates (if they are real GW events)?
The ideal approach combines multiple diverse pipelines working in parallel to ensure comprehensive detection (requiring interpretable models) and using evidence-based detection statistics while simultaneously optimizing both real signal population (p_astro) and noise model (likelihood) fits.

Search

PE

Rate

Key Insight:

# GW

Interpretability Challenges: Discoveries vs. Validation (part 1/2)

Recent AI Discoveries & Validation Hurdles:

A recent study (arXiv:2407.07820) demonstrates how a ResNet-based (CNN) architecture with careful signal search strategy and post-processing can identify 8 new potential gravitational wave events from LIGO O3 data.
The absence of these events in traditional PyCBC results raises questions: could adjustments to rate priors and p_astro parameters in signal models help traditional pipelines detect these candidates (if they are real GW events)?
The ideal approach combines multiple diverse pipelines working in parallel to ensure comprehensive detection (requiring interpretable models) and using evidence-based detection statistics while simultaneously optimizing both real signal population (p_astro) and noise model (likelihood) fits.

Search

PE

Rate

Key Insight:

Credit: DCC-XXXXXXXX

# GW

Interpretability Challenges: Discoveries vs. Validation (part 2/2)

Parameter Estimation Challenges with AI Models:

In parameter estimation, AI models' lack of interpretability requires substantial additional scientific validation to ensure credibility and acceptance of results.
Parameter distributions from AI models often lack robustness across different noise realizations and are difficult to calibrate against established methods.
Scientific papers using AI methods must dedicate significant space to validation procedures, comparing against traditional methods and demonstrating reliability across multiple test cases.

arXiv:2404.14286

Phys. Rev. D 109, 123547 (2024)

hewang@ucas.ac.cn

See more:

Bo Liang and He Wang*, “Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis.”. Astronomical Techniques and Instruments, Vol. 2, No. 6, November 2025. e-Print: arXiv:2507.11192 [gr-qc].

PRD 108, 4 (2023): 044029.

Neural Posterior Estimation with Guaranteed Exact Coverage: The Ringdown of GW150914

# GW

Interpretability Challenges: Discoveries vs. Validation (part 2/2)

Sci4MLGW@ICERM (June 2025)

Parameter Estimation Challenges with AI Models:

In parameter estimation, AI models' lack of interpretability requires substantial additional scientific validation to ensure credibility and acceptance of results.
Parameter distributions from AI models often lack robustness across different noise realizations and are difficult to calibrate against established methods.
Scientific papers using AI methods must dedicate significant space to validation procedures, comparing against traditional methods and demonstrating reliability across multiple test cases.

arXiv:2404.14286

Phys. Rev. D 109, 123547 (2024)

See more:

Bo Liang and He Wang*, “Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis.”. Astronomical Techniques and Instruments, Vol. 2, No. 6, November 2025. e-Print: arXiv:2507.11192 [gr-qc].

PRD 108, 4 (2023): 044029.

Neural Posterior Estimation with Guaranteed Exact Coverage: The Ringdown of GW150914

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

Motivation 1: Traditional methods heavily rely on manually designed filters and statistics.

Motivation 2: AI interpretability challenge: Discoveries vs. Validation.

hewang@ucas.ac.cn

Motivation I: Linear template method using prior data

Traditional matching filters need large templates, increasing computational costs and noise sensitivity, which hampers new gravitational wave signal detection.

Motivation II: Black-box data-driven learning methods

Deep neural networks excel in nonlinear modeling but are "black boxes" with poor interpretability, making them unsuitable for high-risk scientific validation.

The strict requirements for algorithm discovery

Physical constraints: Must follow physical laws and domain knowledge
Efficiency: Must navigate large, costly search spaces
Interpretability: Must be understandable and verifiable by experts

Large Language Models (LLMs) as Designers

LLMs are used in Automated Algorithmic Discovery (AAD) to directly create algorithms or specific components, which are commonly incorporated iteratively to continuously search for better designs.

external_knowledge
(constraint)

Fitness

Challenges and Motivations

import numpy as np
import scipy.signal as signal
def pipeline_v1(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    def data_conditioning(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        window_length = 4096
        dt = times[1] - times[0]
        fs = 1.0 / dt
        
        def whiten_strain(strain):
            strain_zeromean = strain - np.mean(strain)
            freqs, psd = signal.welch(strain_zeromean, fs=fs, nperseg=window_length,
                                       window='hann', noverlap=window_length//2)
            smoothed_psd = np.convolve(psd, np.ones(32) / 32, mode='same')
            smoothed_psd = np.maximum(smoothed_psd, np.finfo(float).tiny)
            white_fft = np.fft.rfft(strain_zeromean) / np.sqrt(np.interp(np.fft.rfftfreq(len(strain_zeromean), d=dt), freqs, smoothed_psd))
            return np.fft.irfft(white_fft)

        whitened_h1 = whiten_strain(strain_h1)
        whitened_l1 = whiten_strain(strain_l1)
        
        return whitened_h1, whitened_l1, times
    
    def compute_metric_series(h1_data: np.ndarray, l1_data: np.ndarray, time_series: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        fs = 1 / (time_series[1] - time_series[0])
        f_h1, t_h1, Sxx_h1 = signal.spectrogram(h1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        f_l1, t_l1, Sxx_l1 = signal.spectrogram(l1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        tf_metric = np.mean((Sxx_h1**2 + Sxx_l1**2) / 2, axis=0)
        gps_mid_time = time_series[0] + (time_series[-1] - time_series[0]) / 2
        metric_times = gps_mid_time + (t_h1 - t_h1[-1] / 2)
        
        return tf_metric, metric_times

    def calculate_statistics(tf_metric, t_h1):
        background_level = np.median(tf_metric)
        peaks, _ = signal.find_peaks(tf_metric, height=background_level * 1.0, distance=2, prominence=background_level * 0.3)
        peak_times = t_h1[peaks]
        peak_heights = tf_metric[peaks]
        peak_deltat = np.full(len(peak_times), 10.0)  # Fixed uncertainty value
        return peak_times, peak_heights, peak_deltat

    whitened_h1, whitened_l1, data_times = data_conditioning(strain_h1, strain_l1, times)
    tf_metric, metric_times = compute_metric_series(whitened_h1, whitened_l1, data_times)
    peak_times, peak_heights, peak_deltat = calculate_statistics(tf_metric, metric_times)
    
    return peak_times, peak_heights, peak_deltat

Input: H1 and L1 detector strains, time array | Output: Event times, significance values, and time uncertainties

P

H

S_p

\mathbb{R}

f

I_p

h

external_knowledge
(constraint)

h

g(h)

Optimization Target: Maximizing Area Under Curve (AUC) in the 1-1000Hz false alarms per-year range, balancing detection sensitivity and false alarm rates across algorithm generations

Automated Heuristic Design: Problem Definition

HW & ZL, arXiv:2508.03661

MLGWSC-1 benchmark

Problem: Pipeline Workflow

Conditions raw detector data (whitening)
Computes time-frequency metrics
Identifies peaks above background
Returns event candidates with timestamps

Algorithmic Exploration：LLM Prompt Engineering

external_knowledge
(constraint)

h

g(h)

Prompt Structure for Algorithm Evolution

This template guides the LLM to generate optimized gravitational wave detection algorithms by learning from comparative examples.

Key Components:

Expert role establishment
Example pair analysis (worse/better algorithm)
Reflection on improvements
Targeted new algorithm generation
Strict output format enforcement

You are an expert in gravitational wave signal detection algorithms. Your task is to design heuristics that can effectively solve optimization problems.

{prompt_task}

I have analyzed two algorithms and provided a reflection on their differences. 

[Worse code]
{worse_code}

[Better code]
{better_code}

[Reflection]
{reflection}

{external_knowledge}

Based on this reflection, please write an improved algorithm according to the reflection. 
First, describe the design idea and main steps of your algorithm in one sentence. The description must be inside a brace outside the code implementation. Next, implement it in Python as a function named '{func_name}'.
This function should accept {input_count} input(s): {joined_inputs}. The function should return {output_count} output(s): {joined_outputs}. 
{inout_inf} {other_inf}

Do not give additional explanations.

One Prompt Template for MLGWSC1 Algorithm Synthesis

HW & ZL, arXiv:2508.03661

deepseek-R1 for reflection generation
o3-mini-medium for code generation

Algorithmic Synergy: MCTS, Evolution & LLM Agents

hewang@ucas.ac.cn

Evaluation for MLGWSC-1 benchmark

LLM-Driven Algorithmic Evolution Through Reflective Code Synthesis.

LLM-Informed Evo-MCTS for AAD

HW & ZL, arXiv:2508.03661

MLGWSC1 Benchmark: Optimization Performance Results

hewang@ucas.ac.cn

HW & ZL, arXiv:2508.03661

Automated exploration of algorithm parameter space

Benchmarking against state-of-the-art methods

MLGWSC1 Benchmark: Optimization Performance Results

hewang@ucas.ac.cn

HW & ZL, arXiv:2508.03661

Automated exploration of algorithm parameter space

Benchmarking against state-of-the-art methods

PyCBC (linear-core)

cWB (nonlinear-core)

Simple filters (non-linear)

CNN-like (highly non-linear)

20.2%

23.4%

hewang@ucas.ac.cn

Algorithmic Component Impact Analysis.

A comprehensive technique impact analysis using controlled comparative methodology

import numpy as np
import scipy.signal as signal
from scipy.signal.windows import tukey
from scipy.signal import savgol_filter

def pipeline_v2(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    The pipeline function processes gravitational wave data from the H1 and L1 detectors to identify potential gravitational wave signals.
    It takes strain_h1 and strain_l1 numpy arrays containing detector data, and times array with corresponding time points.
    The function returns a tuple of three numpy arrays: peak_times containing GPS times of identified events,
    peak_heights with significance values of each peak, and peak_deltat showing time window uncertainty for each peak.
    """
    eps = np.finfo(float).tiny
    dt = times[1] - times[0]
    fs = 1.0 / dt
    # Base spectrogram parameters
    base_nperseg = 256
    base_noverlap = base_nperseg // 2
    medfilt_kernel = 101       # odd kernel size for robust detrending
    uncertainty_window = 5     # half-window for local timing uncertainty

    # -------------------- Stage 1: Robust Baseline Detrending --------------------
    # Remove long-term trends using a median filter for each channel.
    detrended_h1 = strain_h1 - signal.medfilt(strain_h1, kernel_size=medfilt_kernel)
    detrended_l1 = strain_l1 - signal.medfilt(strain_l1, kernel_size=medfilt_kernel)

    # -------------------- Stage 2: Adaptive Whitening with Enhanced PSD Smoothing --------------------
    def adaptive_whitening(strain: np.ndarray) -> np.ndarray:
        # Center the signal.
        centered = strain - np.mean(strain)
        n_samples = len(centered)
        # Adaptive window length: between 5 and 30 seconds
        win_length_sec = np.clip(n_samples / fs / 20, 5, 30)
        nperseg_adapt = int(win_length_sec * fs)
        nperseg_adapt = max(10, min(nperseg_adapt, n_samples))
        
        # Create a Tukey window with 75% overlap.
        tukey_alpha = 0.25
        win = tukey(nperseg_adapt, alpha=tukey_alpha)
        noverlap_adapt = int(nperseg_adapt * 0.75)
        if noverlap_adapt >= nperseg_adapt:
            noverlap_adapt = nperseg_adapt - 1
        
        # Estimate the power spectral density (PSD) using Welch's method.
        freqs, psd = signal.welch(centered, fs=fs, nperseg=nperseg_adapt,
                                  noverlap=noverlap_adapt, window=win, detrend='constant')
        psd = np.maximum(psd, eps)
        
        # Compute relative differences for PSD stationarity measure.
        diff_arr = np.abs(np.diff(psd)) / (psd[:-1] + eps)
        # Smooth the derivative with a moving average.
        if len(diff_arr) >= 3:
            smooth_diff = np.convolve(diff_arr, np.ones(3)/3, mode='same')
        else:
            smooth_diff = diff_arr
        
        # Exponential smoothing (Kalman-like) with adaptive alpha using PSD stationarity.
        smoothed_psd = np.copy(psd)
        for i in range(1, len(psd)):
            # Adaptive smoothing coefficient: base 0.8 modified by local stationarity (±0.05)
            local_alpha = np.clip(0.8 - 0.05 * smooth_diff[min(i-1, len(smooth_diff)-1)], 0.75, 0.85)
            smoothed_psd[i] = local_alpha * smoothed_psd[i-1] + (1 - local_alpha) * psd[i]
            
        # Compute Tikhonov regularization gain based on deviation from median PSD.
        noise_baseline = np.median(smoothed_psd)
        raw_gain = (smoothed_psd / (noise_baseline + eps)) - 1.0
        
        # Compute a causal-like gradient using the Savitzky-Golay filter.
        win_len = 11 if len(smoothed_psd) >= 11 else ((len(smoothed_psd)//2)*2+1)
        polyorder = 2 if win_len > 2 else 1
        delta_freq = np.mean(np.diff(freqs))
        grad_psd = savgol_filter(smoothed_psd, win_len, polyorder, deriv=1, delta=delta_freq, mode='interp')
        
        # Nonlinear scaling via sigmoid to enhance gradient differences.
        sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x))
        scaling_factor = 1.0 + 2.0 * sigmoid(np.abs(grad_psd) / (np.median(smoothed_psd) + eps))
        
        # Compute adaptive gain factors with nonlinear scaling.
        gain = 1.0 - np.exp(-0.5 * scaling_factor * raw_gain)
        gain = np.clip(gain, -8.0, 8.0)
        
        # FFT-based whitening: interpolate gain and PSD onto FFT frequency bins.
        signal_fft = np.fft.rfft(centered)
        freq_bins = np.fft.rfftfreq(n_samples, d=dt)
        interp_gain = np.interp(freq_bins, freqs, gain, left=gain[0], right=gain[-1])
        interp_psd = np.interp(freq_bins, freqs, smoothed_psd, left=smoothed_psd[0], right=smoothed_psd[-1])
        denom = np.sqrt(interp_psd) * (np.abs(interp_gain) + eps)
        denom = np.maximum(denom, eps)
        white_fft = signal_fft / denom
        whitened = np.fft.irfft(white_fft, n=n_samples)
        return whitened

    # Whiten H1 and L1 channels using the adapted method.
    white_h1 = adaptive_whitening(detrended_h1)
    white_l1 = adaptive_whitening(detrended_l1)

    # -------------------- Stage 3: Coherent Time-Frequency Metric with Frequency-Conditioned Regularization --------------------
    def compute_coherent_metric(w1: np.ndarray, w2: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        # Compute complex spectrograms preserving phase information.
        f1, t_spec, Sxx1 = signal.spectrogram(w1, fs=fs, nperseg=base_nperseg,
                                              noverlap=base_noverlap, mode='complex', detrend=False)
        f2, t_spec2, Sxx2 = signal.spectrogram(w2, fs=fs, nperseg=base_nperseg,
                                               noverlap=base_noverlap, mode='complex', detrend=False)
        # Ensure common time axis length.
        common_len = min(len(t_spec), len(t_spec2))
        t_spec = t_spec[:common_len]
        Sxx1 = Sxx1[:, :common_len]
        Sxx2 = Sxx2[:, :common_len]
        
        # Compute phase differences and coherence between detectors.
        phase_diff = np.angle(Sxx1) - np.angle(Sxx2)
        phase_coherence = np.abs(np.cos(phase_diff))
        
        # Estimate median PSD per frequency bin from the spectrograms.
        psd1 = np.median(np.abs(Sxx1)**2, axis=1)
        psd2 = np.median(np.abs(Sxx2)**2, axis=1)
        
        # Frequency-conditioned regularization gain (reflection-guided).
        lambda_f = 0.5 * ((np.median(psd1) / (psd1 + eps)) + (np.median(psd2) / (psd2 + eps)))
        lambda_f = np.clip(lambda_f, 1e-4, 1e-2)
        # Regularization denominator integrating detector PSDs and lambda.
        reg_denom = (psd1[:, None] + psd2[:, None] + lambda_f[:, None] + eps)
        
        # Weighted phase coherence that balances phase alignment with noise levels.
        weighted_comp = phase_coherence / reg_denom
        
        # Compute axial (frequency) second derivatives as curvature estimates.
        d2_coh = np.gradient(np.gradient(phase_coherence, axis=0), axis=0)
        avg_curvature = np.mean(np.abs(d2_coh), axis=0)
        
        # Nonlinear activation boost using tanh for regions of high curvature.
        nonlinear_boost = np.tanh(5 * avg_curvature)
        linear_boost = 1.0 + 0.1 * avg_curvature
        
        # Cross-detector synergy: weight derived from global median consistency.
        novel_weight = np.mean((np.median(psd1) + np.median(psd2)) / (psd1[:, None] + psd2[:, None] + eps), axis=0)
        
        # Integrated time-frequency metric combining all enhancements.
        tf_metric = np.sum(weighted_comp * linear_boost * (1.0 + nonlinear_boost), axis=0) * novel_weight
        
        # Adjust the spectrogram time axis to account for window delay.
        metric_times = t_spec + times[0] + (base_nperseg / 2) / fs
        return tf_metric, metric_times

    tf_metric, metric_times = compute_coherent_metric(white_h1, white_l1)

    # -------------------- Stage 4: Multi-Resolution Thresholding with Octave-Spaced Dyadic Wavelet Validation --------------------
    def multi_resolution_thresholding(metric: np.ndarray, times_arr: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        # Robust background estimation with median and MAD.
        bg_level = np.median(metric)
        mad_val = np.median(np.abs(metric - bg_level))
        robust_std = 1.4826 * mad_val
        threshold = bg_level + 1.5 * robust_std

        # Identify candidate peaks using prominence and minimum distance criteria.
        peaks, _ = signal.find_peaks(metric, height=threshold, distance=2, prominence=0.8 * robust_std)
        if peaks.size == 0:
            return np.array([]), np.array([]), np.array([])

        # Local uncertainty estimation using a Gaussian-weighted convolution.
        win_range = np.arange(-uncertainty_window, uncertainty_window + 1)
        sigma = uncertainty_window / 2.5
        gauss_kernel = np.exp(-0.5 * (win_range / sigma) ** 2)
        gauss_kernel /= np.sum(gauss_kernel)
        weighted_mean = np.convolve(metric, gauss_kernel, mode='same')
        weighted_sq = np.convolve(metric ** 2, gauss_kernel, mode='same')
        variances = np.maximum(weighted_sq - weighted_mean ** 2, 0.0)
        uncertainties = np.sqrt(variances)
        uncertainties = np.maximum(uncertainties, 0.01)

        valid_times = []
        valid_heights = []
        valid_uncerts = []
        n_metric = len(metric)

        # Compute a simple second derivative for local curvature checking.
        if n_metric > 2:
            second_deriv = np.diff(metric, n=2)
            second_deriv = np.pad(second_deriv, (1, 1), mode='edge')
        else:
            second_deriv = np.zeros_like(metric)

        # Use octave-spaced scales (dyadic wavelet validation) to validate peak significance.
        widths = np.arange(1, 9)  # approximate scales 1 to 8
        for peak in peaks:
            # Skip peaks lacking sufficient negative curvature.
            if second_deriv[peak] > -0.1 * robust_std:
                continue
            local_start = max(0, peak - uncertainty_window)
            local_end = min(n_metric, peak + uncertainty_window + 1)
            local_segment = metric[local_start:local_end]
            if len(local_segment) < 3:
                continue
            try:
                cwt_coeff = signal.cwt(local_segment, signal.ricker, widths)
            except Exception:
                continue
            max_coeff = np.max(np.abs(cwt_coeff))
            # Threshold for validating the candidate using local MAD.
            cwt_thresh = mad_val * np.sqrt(2 * np.log(len(local_segment) + eps))
            if max_coeff >= cwt_thresh:
                valid_times.append(times_arr[peak])
                valid_heights.append(metric[peak])
                valid_uncerts.append(uncertainties[peak])

        if len(valid_times) == 0:
            return np.array([]), np.array([]), np.array([])
        return np.array(valid_times), np.array(valid_heights), np.array(valid_uncerts)

    peak_times, peak_heights, peak_deltat = multi_resolution_thresholding(tf_metric, metric_times)
    return peak_times, peak_heights, peak_deltat

Automatically discover and interpret the value of nonlinear algorithms
Facilitating new knowledge production along with experience guidance

PT Level 5

Interpretability Analysis

HW & ZL, arXiv:2508.03661

PT Level 5

import numpy as np
import scipy.signal as signal
from scipy.signal.windows import tukey
from scipy.signal import savgol_filter

def pipeline_v2(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    The pipeline function processes gravitational wave data from the H1 and L1 detectors to identify potential gravitational wave signals.
    It takes strain_h1 and strain_l1 numpy arrays containing detector data, and times array with corresponding time points.
    The function returns a tuple of three numpy arrays: peak_times containing GPS times of identified events,
    peak_heights with significance values of each peak, and peak_deltat showing time window uncertainty for each peak.
    """
    eps = np.finfo(float).tiny
    dt = times[1] - times[0]
    fs = 1.0 / dt
    # Base spectrogram parameters
    base_nperseg = 256
    base_noverlap = base_nperseg // 2
    medfilt_kernel = 101       # odd kernel size for robust detrending
    uncertainty_window = 5     # half-window for local timing uncertainty

    # -------------------- Stage 1: Robust Baseline Detrending --------------------
    # Remove long-term trends using a median filter for each channel.
    detrended_h1 = strain_h1 - signal.medfilt(strain_h1, kernel_size=medfilt_kernel)
    detrended_l1 = strain_l1 - signal.medfilt(strain_l1, kernel_size=medfilt_kernel)

    # -------------------- Stage 2: Adaptive Whitening with Enhanced PSD Smoothing --------------------
    def adaptive_whitening(strain: np.ndarray) -> np.ndarray:
        # Center the signal.
        centered = strain - np.mean(strain)
        n_samples = len(centered)
        # Adaptive window length: between 5 and 30 seconds
        win_length_sec = np.clip(n_samples / fs / 20, 5, 30)
        nperseg_adapt = int(win_length_sec * fs)
        nperseg_adapt = max(10, min(nperseg_adapt, n_samples))
        
        # Create a Tukey window with 75% overlap.
        tukey_alpha = 0.25
        win = tukey(nperseg_adapt, alpha=tukey_alpha)
        noverlap_adapt = int(nperseg_adapt * 0.75)
        if noverlap_adapt >= nperseg_adapt:
            noverlap_adapt = nperseg_adapt - 1
        
        # Estimate the power spectral density (PSD) using Welch's method.
        freqs, psd = signal.welch(centered, fs=fs, nperseg=nperseg_adapt,
                                  noverlap=noverlap_adapt, window=win, detrend='constant')
        psd = np.maximum(psd, eps)
        
        # Compute relative differences for PSD stationarity measure.
        diff_arr = np.abs(np.diff(psd)) / (psd[:-1] + eps)
        # Smooth the derivative with a moving average.
        if len(diff_arr) >= 3:
            smooth_diff = np.convolve(diff_arr, np.ones(3)/3, mode='same')
        else:
            smooth_diff = diff_arr
        
        # Exponential smoothing (Kalman-like) with adaptive alpha using PSD stationarity.
        smoothed_psd = np.copy(psd)
        for i in range(1, len(psd)):
            # Adaptive smoothing coefficient: base 0.8 modified by local stationarity (±0.05)
            local_alpha = np.clip(0.8 - 0.05 * smooth_diff[min(i-1, len(smooth_diff)-1)], 0.75, 0.85)
            smoothed_psd[i] = local_alpha * smoothed_psd[i-1] + (1 - local_alpha) * psd[i]
            
        # Compute Tikhonov regularization gain based on deviation from median PSD.
        noise_baseline = np.median(smoothed_psd)
        raw_gain = (smoothed_psd / (noise_baseline + eps)) - 1.0
        
        # Compute a causal-like gradient using the Savitzky-Golay filter.
        win_len = 11 if len(smoothed_psd) >= 11 else ((len(smoothed_psd)//2)*2+1)
        polyorder = 2 if win_len > 2 else 1
        delta_freq = np.mean(np.diff(freqs))
        grad_psd = savgol_filter(smoothed_psd, win_len, polyorder, deriv=1, delta=delta_freq, mode='interp')
        
        # Nonlinear scaling via sigmoid to enhance gradient differences.
        sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x))
        scaling_factor = 1.0 + 2.0 * sigmoid(np.abs(grad_psd) / (np.median(smoothed_psd) + eps))
        
        # Compute adaptive gain factors with nonlinear scaling.
        gain = 1.0 - np.exp(-0.5 * scaling_factor * raw_gain)
        gain = np.clip(gain, -8.0, 8.0)
        
        # FFT-based whitening: interpolate gain and PSD onto FFT frequency bins.
        signal_fft = np.fft.rfft(centered)
        freq_bins = np.fft.rfftfreq(n_samples, d=dt)
        interp_gain = np.interp(freq_bins, freqs, gain, left=gain[0], right=gain[-1])
        interp_psd = np.interp(freq_bins, freqs, smoothed_psd, left=smoothed_psd[0], right=smoothed_psd[-1])
        denom = np.sqrt(interp_psd) * (np.abs(interp_gain) + eps)
        denom = np.maximum(denom, eps)
        white_fft = signal_fft / denom
        whitened = np.fft.irfft(white_fft, n=n_samples)
        return whitened

    # Whiten H1 and L1 channels using the adapted method.
    white_h1 = adaptive_whitening(detrended_h1)
    white_l1 = adaptive_whitening(detrended_l1)

    # -------------------- Stage 3: Coherent Time-Frequency Metric with Frequency-Conditioned Regularization --------------------
    def compute_coherent_metric(w1: np.ndarray, w2: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        # Compute complex spectrograms preserving phase information.
        f1, t_spec, Sxx1 = signal.spectrogram(w1, fs=fs, nperseg=base_nperseg,
                                              noverlap=base_noverlap, mode='complex', detrend=False)
        f2, t_spec2, Sxx2 = signal.spectrogram(w2, fs=fs, nperseg=base_nperseg,
                                               noverlap=base_noverlap, mode='complex', detrend=False)
        # Ensure common time axis length.
        common_len = min(len(t_spec), len(t_spec2))
        t_spec = t_spec[:common_len]
        Sxx1 = Sxx1[:, :common_len]
        Sxx2 = Sxx2[:, :common_len]
        
        # Compute phase differences and coherence between detectors.
        phase_diff = np.angle(Sxx1) - np.angle(Sxx2)
        phase_coherence = np.abs(np.cos(phase_diff))
        
        # Estimate median PSD per frequency bin from the spectrograms.
        psd1 = np.median(np.abs(Sxx1)**2, axis=1)
        psd2 = np.median(np.abs(Sxx2)**2, axis=1)
        
        # Frequency-conditioned regularization gain (reflection-guided).
        lambda_f = 0.5 * ((np.median(psd1) / (psd1 + eps)) + (np.median(psd2) / (psd2 + eps)))
        lambda_f = np.clip(lambda_f, 1e-4, 1e-2)
        # Regularization denominator integrating detector PSDs and lambda.
        reg_denom = (psd1[:, None] + psd2[:, None] + lambda_f[:, None] + eps)
        
        # Weighted phase coherence that balances phase alignment with noise levels.
        weighted_comp = phase_coherence / reg_denom
        
        # Compute axial (frequency) second derivatives as curvature estimates.
        d2_coh = np.gradient(np.gradient(phase_coherence, axis=0), axis=0)
        avg_curvature = np.mean(np.abs(d2_coh), axis=0)
        
        # Nonlinear activation boost using tanh for regions of high curvature.
        nonlinear_boost = np.tanh(5 * avg_curvature)
        linear_boost = 1.0 + 0.1 * avg_curvature
        
        # Cross-detector synergy: weight derived from global median consistency.
        novel_weight = np.mean((np.median(psd1) + np.median(psd2)) / (psd1[:, None] + psd2[:, None] + eps), axis=0)
        
        # Integrated time-frequency metric combining all enhancements.
        tf_metric = np.sum(weighted_comp * linear_boost * (1.0 + nonlinear_boost), axis=0) * novel_weight
        
        # Adjust the spectrogram time axis to account for window delay.
        metric_times = t_spec + times[0] + (base_nperseg / 2) / fs
        return tf_metric, metric_times

    tf_metric, metric_times = compute_coherent_metric(white_h1, white_l1)

    # -------------------- Stage 4: Multi-Resolution Thresholding with Octave-Spaced Dyadic Wavelet Validation --------------------
    def multi_resolution_thresholding(metric: np.ndarray, times_arr: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        # Robust background estimation with median and MAD.
        bg_level = np.median(metric)
        mad_val = np.median(np.abs(metric - bg_level))
        robust_std = 1.4826 * mad_val
        threshold = bg_level + 1.5 * robust_std

        # Identify candidate peaks using prominence and minimum distance criteria.
        peaks, _ = signal.find_peaks(metric, height=threshold, distance=2, prominence=0.8 * robust_std)
        if peaks.size == 0:
            return np.array([]), np.array([]), np.array([])

        # Local uncertainty estimation using a Gaussian-weighted convolution.
        win_range = np.arange(-uncertainty_window, uncertainty_window + 1)
        sigma = uncertainty_window / 2.5
        gauss_kernel = np.exp(-0.5 * (win_range / sigma) ** 2)
        gauss_kernel /= np.sum(gauss_kernel)
        weighted_mean = np.convolve(metric, gauss_kernel, mode='same')
        weighted_sq = np.convolve(metric ** 2, gauss_kernel, mode='same')
        variances = np.maximum(weighted_sq - weighted_mean ** 2, 0.0)
        uncertainties = np.sqrt(variances)
        uncertainties = np.maximum(uncertainties, 0.01)

        valid_times = []
        valid_heights = []
        valid_uncerts = []
        n_metric = len(metric)

        # Compute a simple second derivative for local curvature checking.
        if n_metric > 2:
            second_deriv = np.diff(metric, n=2)
            second_deriv = np.pad(second_deriv, (1, 1), mode='edge')
        else:
            second_deriv = np.zeros_like(metric)

        # Use octave-spaced scales (dyadic wavelet validation) to validate peak significance.
        widths = np.arange(1, 9)  # approximate scales 1 to 8
        for peak in peaks:
            # Skip peaks lacking sufficient negative curvature.
            if second_deriv[peak] > -0.1 * robust_std:
                continue
            local_start = max(0, peak - uncertainty_window)
            local_end = min(n_metric, peak + uncertainty_window + 1)
            local_segment = metric[local_start:local_end]
            if len(local_segment) < 3:
                continue
            try:
                cwt_coeff = signal.cwt(local_segment, signal.ricker, widths)
            except Exception:
                continue
            max_coeff = np.max(np.abs(cwt_coeff))
            # Threshold for validating the candidate using local MAD.
            cwt_thresh = mad_val * np.sqrt(2 * np.log(len(local_segment) + eps))
            if max_coeff >= cwt_thresh:
                valid_times.append(times_arr[peak])
                valid_heights.append(metric[peak])
                valid_uncerts.append(uncertainties[peak])

        if len(valid_times) == 0:
            return np.array([]), np.array([]), np.array([])
        return np.array(valid_times), np.array(valid_heights), np.array(valid_uncerts)

    peak_times, peak_heights, peak_deltat = multi_resolution_thresholding(tf_metric, metric_times)
    return peak_times, peak_heights, peak_deltat

Interpretability Analysis

HW & ZL, arXiv:2508.03661

Out-of-distribution (OOD) detection

Generalization capability and robustness of the optimized algorithms

hewang@ucas.ac.cn

MCTS Depth-Stratified Performance Analysis.

Analyzed the relationship between MCTS tree depth and algorithm fitness across different optimization phases. The 10-layer MCTS structure was stratified into three depth groups: Depth I (depths 1-4), Depth II (depths 5-7), and Depth III (depths 8-10), representing shallow, intermediate, and deep exploration levels, respectively.

Algorithmic Component Impact Analysis.

A comprehensive technique impact analysis using controlled comparative methodology

Interpretability Analysis

HW & ZL, arXiv:2508.03661

hewang@ucas.ac.cn

Algorithmic Component Impact Analysis.

A comprehensive technique impact analysis using controlled comparative methodology

Interpretability Analysis

HW & ZL, arXiv:2508.03661

Please analyze the following Python code snippet for gravitational wave detection and
extract technical features in JSON format.

The code typically has three main stages:
1. Data Conditioning: preprocessing, filtering, whitening, etc.
2. Time-Frequency Analysis: spectrograms, FFT, wavelets, etc.
3. Trigger Analysis: peak detection, thresholding, validation, etc.

For each stage present in the code, extract:
- Technical methods used
- Libraries and functions called
- Algorithm complexity features
- Key parameters

Code to analyze:
```python
{code_snippet}
```

Please return a JSON object with this structure:
{
  "algorithm_id": "{algorithm_id}",
  "stages": {
    "data_conditioning": {
      "present": true/false,
      "techniques": ["technique1", "technique2"],
      "libraries": ["lib1", "lib2"],
      "functions": ["func1", "func2"],
      "parameters": {"param1": "value1"},
      "complexity": "low/medium/high"
    },
    "time_frequency_analysis": {...},
    "trigger_analysis": {...}
  },
  "overall_complexity": "low/medium/high",
  "total_lines": 0,
  "unique_libraries": ["lib1", "lib2"],
  "code_quality_score": 0.0
}

Only return the JSON object, no additional text.

hewang@ucas.ac.cn

Interpretability Analysis

HW & ZL, arXiv:2508.03661

MCTS Algorithmic Evolution Pathway

Complete MCTS tree structure showing all nodes associated with the optimal algorithm (node 486, fitness=5041.4).

hewang@ucas.ac.cn

Interpretability Analysis

HW & ZL, arXiv:2508.03661

MCTS Algorithmic Evolution Pathway

Complete MCTS tree structure showing all nodes associated with the optimal algorithm (node 486, fitness=5041.4).

hewang@ucas.ac.cn

Interpretability Analysis

HW & ZL, arXiv:2508.03661

52.8% achieving superior fitness with 100% Tikhonov regularization inheritance

89.3% variants exceeding preceding node performance

70.7% variants outperforming node 204, 25.0% surpassing node 485

Edge robustness analysis for three critical evolutionary transitions.

The distributions demonstrate the stochastic nature of LLM-driven code generation while confirming the consistent discovery of high-performance algorithmic variants.

hewang@ucas.ac.cn

Integrated Architecture Validation

A comprehensive comparison of our integrated
Evo-MCTS framework against its constituent components operating in isolation.
- Evo-MCTS: MCTS + Self-evolve + Reflection mech.
- MCTS-AHD: MCTS framework for CO.
- ReEvo: evolutionary framework for CO.

Contributions of knowledge synthesis

Compare to w/o external knowledge
- non-linear vs linear only

LLM Model Selection and Robustness Analysis

Ablation study of various LLM contributions (code generator) and their robustness.
- ```
o3-mini-medium
o1-2024-12-17
gpt-4o-2024-11-20
```
```
claude-3-7-sonnet-20250219-thinking
```

59.1%

MCTS-AHD (2501.08603)

ReEvo (2402.01145)

Framework Mechanism Analysis

HW & ZL, arXiv:2508.03661

hewang@ucas.ac.cn

Integrated Architecture Validation

A comprehensive comparison of our integrated
Evo-MCTS framework against its constituent components operating in isolation.
- Evo-MCTS: MCTS + Self-evolve + Reflection mech.
- MCTS-AHD: MCTS framework for CO.
- ReEvo: evolutionary framework for CO.

Contributions of knowledge synthesis

Compare to w/o external knowledge
- non-linear vs linear only

LLM Model Selection and Robustness Analysis

Ablation study of various LLM contributions (code generator) and their robustness.
- ```
o3-mini-medium
o1-2024-12-17
gpt-4o-2024-11-20
```
```
claude-3-7-sonnet-20250219-thinking
```

59.1%

Framework Mechanism Analysis

HW & ZL, arXiv:2508.03661

hewang@ucas.ac.cn

Integrated Architecture Validation

A comprehensive comparison of our integrated
Evo-MCTS framework against its constituent components operating in isolation.
- Evo-MCTS: MCTS + Self-evolve + Reflection mech.
- MCTS-AHD: MCTS framework for CO.
- ReEvo: evolutionary framework for CO.

Contributions of knowledge synthesis

Compare to w/o external knowledge
- non-linear vs linear only

59.1%

Framework Mechanism Analysis

HW & ZL, arXiv:2508.03661

### External Knowledge Integration
1. **Non-linear** Processing Core Concepts:
    - Signal Transformation: 
        * Non-linear vs linear decomposition
        * Adaptive threshold mechanisms
        * Multi-scale analysis
    
    - Feature Extraction:
        * Phase space reconstruction
        * Topological data analysis
        * Wavelet-based detection
    
    - Statistical Analysis:
        * Robust estimators
        * Non-Gaussian processes
        * Higher-order statistics

2. Implementation Principles:
    - Prioritize adaptive over fixed parameters
    - Consider local vs global characteristics
    - Balance computational cost with accuracy

Key Takeaways

hewang@ucas.ac.cn

Key Challenge: How can we maintain the interpretability advantages of traditional models while leveraging the power of AI approaches?

Key Trust Factors:

✓ Interpretable: Parameters have physical meaning
✓ Built-in uncertainties: Input uncertainties propagate to outputs
✓ Model selection: Balance simplicity with accuracy
✓ Scientific insight: Reduces complexity, reveals principles

Motivation 1: Traditional methods heavily rely on manually designed filters and statistics.

Motivation 2: AI interpretability challenge: Discoveries vs. Validation.

Traditional Physics Approach

Input

Human-Designed Algorithm

(Based on human insight)

Output

Example: Matched Filtering, linear regression

Data/
Experience

Black-Box AI Approach

Input

AI Model

(Low interpretability)

Output

Examples: CNN, AlphaGo, DINGO

Data/
Experience

Key Takeaways

hewang@ucas.ac.cn

Black-Box AI Approach

Input

AI Model

(Low interpretability)

Output

Examples: CNN, AlphaGo, DINGO

Traditional Physics Approach

Input

Human-Designed Algorithm

(Based on human insight)

Output

Example: Matched Filtering, linear regression

Data/
Experience

Our Mission: To create transparent AI systems that combine physics-based interpretability with deep learning capabilities

Interpretable AI Approach

The best of both worlds

Input

Physics-Informed
Algorithm

(High interpretability)

Output

Example: Evo-MCTS, AlphaEvolve

Physics
Knowledge

AI Model

🎯 OUR WORK

Motivation 1: Traditional methods heavily rely on manually designed filters and statistics.

Motivation 2: AI interpretability challenge: Discoveries vs. Validation.

hewang@ucas.ac.cn

Key Takeaways

Any algorithm's design problem can be viewed as an optimization challenge

Numerous intermediate processes in scientific data processing, like noise modeling and experimental design, can be classified as "algorithm optimization" problems
Several analytical modeling techniques and "symbolic regression" methods in theoretical physics and cosmology can similarly be considered "algorithm optimization" issues

FYI:

AI-driven design of experiments. [Phys. Rev. X 15, 021012 (2025)]
RL design for multiple filters in LIGO control system. [Science (2025)]

MLGWSC1 Benchmark: Optimization Performance Results

hewang@ucas.ac.cn

HW & ZL, arXiv:2508.03661

Automated exploration of algorithm parameter space

Benchmarking against state-of-the-art methods

PyCBC (linear-core)

cWB (nonlinear-core)

Simple filters (non-linear)

CNN-like (highly non-linear)

Black-Box AI Approach

Input

AI Model

(Low interpretability)

Output

Examples: CNN, AlphaGo, DINGO

Traditional Physics Approach

Input

Human-Designed Algorithm

(Based on human insight)

Output

Example: Matched Filtering, linear regression

Data/
Experience

Our Mission: To create transparent AI systems that combine physics-based interpretability with deep learning capabilities

Interpretable AI Approach

The best of both worlds

Input

Physics-Informed
Algorithm

(High interpretability)

Output

Example: Evo-MCTS (ours), AlphaEvolve

Physics
Knowledge

AI Model

🎯 OUR WORK

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

for _ in range(num_of_audiences):
    print('Thank you for your attention! 🙏')

hewang@ucas.ac.cn

Acknowledgment:

This slide: https://slides.com/iphysresearch/2025nov_fqcp

Who Am I
— A quick intro and how I got into this field
What Is Machine Learning?
— The basics and why it matters
Deep Learning: When Machines Start to See and Think
— From neural networks to powerful representations
Gravitational Waves Meet Machine Learning
— How ML is reshaping data analysis in GW astronomy
Let’s Get Practical: Searching for Gravitational Waves
— A hands-on look at applying ML in real GW searches
LLMs for Gravitational Waves: My Ongoing Work

— Towards automated and interpretable scientific discovery

Content

hewang@ucas.ac.cn

Key Takeaways

模型性能评估与测试调优

# GW: DL

模型调优，过拟合与欠拟合

调参过程相似：先产生若干模型，然后基于某种评估。
- 算法的参数：一般由人工设定，亦称“超参数”
- 模型的参数：一般由学习确定
参数调得好不好，往往对最终性能有关键影响。

模型性能评估与测试调优

# GW: DL

模型调优，过拟合与欠拟合

模型泛化性的评价：
- 过拟合（over-fitting）：在训练数据上表现良好，在未知数据上表现差。
- 欠拟合（under-fitting）：在训练数据和未知数据上表现都很差。
- 解决办法：重新选数据，重新定模型

模型性能评估与测试调优

# GW: DL

模型调优，过拟合与欠拟合

讨论机器学习模型学习和泛化的好坏时，通常使用术语:过拟合和欠拟合。
模型泛化性的评价：
- 过拟合（over-fitting）：在训练数据上表现良好，在未知数据上表现差。
- 欠拟合（under-fitting）：在训练数据和未知数据上表现都很差。
- 解决办法：重新选数据，重新定模型
模型怎么定？
- 不同模型复杂度在评价指标上的表现

素材来源：DOI: 10.1177/2374289519873088

模型性能评估与测试调优

# GW: DL

没有免费午餐定理（No free lunch theorem）

对于所有可能的域（所有可能的问题实例均来自均匀的概率分布），算法A和B的平均性能相同。

Wolpert D H. The lack of a priori distinctions between learning algorithms[J]. Neural computation, 1996, 8(7): 1341-1390.

没有免费午餐理论对于个人的指导

在依赖模型或搜索算法之前，请始终检查您的假设。
没有“超级算法”能完美适用于所有数据集。

这是因为几乎所有非死记硬背的(non-rote)机器学习算法或统计模型都需要对预测变量和目标变量之间的关系做出了一些假设，从而将偏差 (bias)引入了模型，具体称为归纳或学习偏差（inductive or learning bias）。
无偏差学习是徒劳的，因为没有先验假设的学习者在提供新的，看不见的输入数据时将没有合理的基础来创建估计。
这些假设使得某些算法在某些数据集上表现优秀，而在其他数据集上表现不佳。换句话说，一个算法的有效性取决于它的偏差（即假设）与数据的真实性质之间的匹配程度。这就意味着，对于任何给定的算法，总会存在一些它无法有效处理的数据集。
算法的假设适用于某些数据集，但不适用于其他数据集。该现象对于理解欠拟合（underfitting）的概念和偏差/方差折衷（bias/variance tradeoff）至关重要。

https://medium.com/@LeonFedden/the-no-free-lunch-theorem-62ae2c3ed10c

模型性能评估与测试调优

# GW: DL

偏差-方差窘境（bias-variance dilemma）

一般而言，偏差与方差存在冲突：
- 训练不足时，学习器拟合学习能力不强，偏差主导
- 随着训练程度加深，学习器拟合能力逐渐增强，方差逐渐主导
- 训练充足后，学习器的拟合能力很强，方差主导

泛化性能是由学习算法的能力、数据的充分性以及学习任务本身的难度共同决定。

模型性能评估与测试调优

# GW: DL

模型调优，过拟合与欠拟合

过拟合和欠拟合是机器学习中常见的两种问题。

过拟合：当模型在训练数据上表现得过于优秀，但在测试数据或新数据上表现不佳时，我们称模型出现了过拟合。过拟合的模型过于复杂，以至于它甚至学习了训练数据中的噪声。在图表中，过拟合通常表现为训练误差持续降低，但验证误差开始上升。
解决过拟合的方法包括：
- 增加数据量：更多的数据可以帮助模型学习到更多的信息，减少过拟合的可能性。
- 正则化：正则化是一种添加惩罚项的技术，可以防止模型的权重过大，从而降低模型复杂度。
- 早停：在验证误差开始上升时停止训练，可以防止模型过度学习训练数据。
- 降低模型复杂度：简化模型，如减少神经网络的层数或神经元数量，可以降低模型的复杂度，减少过拟合的可能性。
- ...
欠拟合：当模型在训练数据和测试数据上的表现都不佳时，我们称模型出现了欠拟合。欠拟合的模型过于简单，无法捕捉到数据中的模式。在图表中，欠拟合表现为训练误差和验证误差都很高。
解决欠拟合的方法包括：
- 增加模型复杂度：增加更多的特征，或者使用更复杂的模型，如增加神经网络的层数或神经元数量，可以帮助模型捕捉到更复杂的模式。
- 减少正则化：如果模型过于简单，可能是正则化过度，可以尝试减少正则化的程度。
- 更换模型：如果当前模型无法很好地拟合数据，可以尝试更换其他类型的模型。
- ...

模型性能评估与测试调优

# GW: DL

模型评估与选择

比较检验：在某种度量下取得评估结果后，是否可以直接比较以评判优劣?
- No! 因为：
  - 测试性能不等于泛化性能
  - 测试性能随着测试集的变化而变化
  - 很多机器学习算法本身有一定的随机性
机器学习任务 \(\rightarrow\) “概率近似正确”
统计假设检验 (hypothesis test) 为学习器性能比较提供了重要依据【应需要有统计显著性作为评判依据】
- 两学习器比较
  - 交叉验证 t 检验（基于成对 t 检验）
  - McNemar 检验（基于列联表、卡方检验）
- 多学习器比较
  - Kolmogorv-Smirnov Test (K-S检验)
  - Friedman 检验（基于序值，F检验；判断“是否相同”）
  - Nemenyi 后续检验（基于序值，进一步判断两两差别）

https://dcc.ligo.org/LIGO-G070831/public

https://arxiv.org/abs/2203.03449

Veitch, J., et al. Physical Review D 91, no. 4 (February 2015): 042003. https://doi.org/10.1103/PhysRevD.91.042003.

https://arxiv.org/abs/1807.09241

Fundamentals of Machine Learning for Gravitational Wave Search

Content

Content

Who Am I

Teaching

Selected Works

Content

AI > Machine Learning > Deep Learning

What Is Machine Learning?

Goal of Machine Learning

The Machine Learning Process

Common Types of Machine Learning

Common Types of Machine Learning: SL

Supervised Learning vs Matched Filtering

Common Types of Machine Learning: uSL

Common Types of Machine Learning: uSL

Other Types of Machine Learning

Classification of Machine Learning Models

Classification of Machine Learning Models

Classification of Machine Learning Models

Classification of Machine Learning Models

Classification of Machine Learning Models

Classification of Machine Learning Models

Classification of Machine Learning Models

Content

The Origins of Deep Learning

The Origins of Deep Learning

The Origins of Deep Learning

The Origins of Deep Learning

The Origins of Deep Learning

The Origins of Deep Learning

The Origins of Deep Learning

The Core of Deep Learning

Characteristics of Deep Learning

Characteristics of Deep Learning

Characteristics of Deep Learning

Essence of Deep Learning

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

深度学习技术的发展

Content

Gravitational Wave Astronomy

Challenge and Methodology: Detecting Signals in GW Data

CNN for GW Detection: Pioneering Approaches

CNN for GW Detection: Pioneering Approaches

CNN for GW Detection: Feature Extraction

CNN for GW Detection: Feature Extraction

CNN for GW Detection: Feature Extraction

CNN for GW Detection: Feature Extraction

First Benchmark for GW Detection Algorithms

Interpretability Challenges: Comparing Detection Statistics

Exploring Beyond General Relativity

Interpretability Challenges: Discoveries vs. Validation (part 1/2)

Interpretability Challenges: Discoveries vs. Validation (part 1/2)

Interpretability Challenges: Discoveries vs. Validation (part 2/2)

Interpretability Challenges: Discoveries vs. Validation (part 2/2)

Content

Challenges and Motivations

Automated Heuristic Design: Problem Definition

Algorithmic Exploration：LLM Prompt Engineering

Algorithmic Synergy: MCTS, Evolution & LLM Agents

MLGWSC1 Benchmark: Optimization Performance Results

MLGWSC1 Benchmark: Optimization Performance Results

Interpretability Analysis

Interpretability Analysis

Interpretability Analysis

Interpretability Analysis

Interpretability Analysis

Interpretability Analysis

Interpretability Analysis

Framework Mechanism Analysis

Framework Mechanism Analysis

Framework Mechanism Analysis

Key Takeaways

Interpretability Challenges:
Comparing Detection Statistics