Evolutionary and Reinforcement Learning for
GW Data Analysis: from Large Language Model to Global Fitting

王赫(He Wang)

2026/04/19 | 中国物理学会引力与相对论天体物理分会 - 2026年学术年会

International Centre for Theoretical Physics Asia-Pacific (ICTP-AP)

University of Chinese Academy of Sciences  (UCAS)

hewang@ucas.ac.cn

References

  • HW, LZ, arXiv:2508.03661
  • BYZ, HW, LZ, arXiv:2602.08253
  • HW+ (In Preparation)

Abstract

Upcoming challenges such as MLGWSC2, currently at the proposal stage, provide a new testbed for exploring machine-learning–based approaches to gravitational-wave analysis. In this flash talk, I briefly introduce my core ideas and experience using evolutionary algorithms, Evo-MCTS, and reinforcement learning as adaptive search and optimization tools. I outline key methodological insights and discuss how these ideas may inform future GW analysis tasks, including potential applications to LISA.

才翻到上面看到有人现场拍照 [破涕为笑],随手分享一下 

  • 我最近常用的PPT英语字体是 Economica,是一个风格比较现代的无衬线字体:https://fonts.google.com/specimen/Economica 
  • 但用这个字体显得好看,牺牲了一点儿清晰度,有需要的时候还是会回归Helvetica Neue
  • 衬线字体我喜欢用 Arno Pro: https://fonts.adobe.com/fonts/arno
  • 中文字体已经锁死了喜鹊宋或者木叶(收费字体)
  • 颜色一般从MetBrewer里面挑,但并没有特别注意配色:https://github.com/BlakeRMills/MetBrewer
  • 今天刚和邵老师说,可能是中年危机的一种表现,就现在越来越喜欢五颜六色的东西。。。也体现在了PPT上。这个完全见仁见智。
  •  如果有人对这种PPT感兴趣,我把一个7月份会议的短PPT分享在这供参考:https://www.dropbox.com/scl/fi/duez2bpbcck4ogtn98sw6/songhuang_sesto_20250707.key?rlkey=g18rnjym1hpzke3jxcj5y6ezh&st=ot5xu2w8&dl=0
  • 我自己现在习惯的PPT排版的风格只适合分steps展示,不能一次都show全。我自己开始使用这个风格是上课以后,需要满足PPT好看,能吸引注意力,但同时信息量够足,学生可以拿来复习。暂时觉得还好,但过两年可能还是会学着做简单一点儿。
  • 用字体大小和颜色来highlight关键词是最简单粗暴、最俗的引导视线的方法,属于广告里早就用烂了的。其实有更好的设计语言,但不会。。。
  • PPT风格纯属个人审美兴趣,和报告水平,更和报告内容好坏无关。

AI for Science

2025年的后半年,大家纷纷开始押注 AI for Science!

  • OpenAI 首席产品官(CPO)Kevin Weil 在社交媒体上正式宣布 OpenAI 将开启一项新计划 ——OpenAI for Science,旨在打造下一代科学工具:一个由人工智能驱动、能加速科学发现的平台。
     
  • 2025年11月24日,美国白宫,特朗普正式签署,「创世纪计划」(Genesis Mission)正式启动!这是一项被比作「AI曼哈顿计划」的重大行政命令。这项计划的核心目标是:加速利用AI推动科学突破!
    DeepMind:把“AI 科学合作者直接送进国家实验室

 

 

 

 

 

  • NVIDIA:把科学问题变成 AI 基建问题。在 NVIDIA 的叙事里,AI for Science 并不局限于“模型”,而是一整套算力平台 + 仿真系统 + 自动化实验 + 工程化工作流,这是 Genesis 能“站得住”的物理基础。
  • 2025 年 8 月, 中国国务院发布 《关于深入实施“人工智能+”行动的意见》(国发〔2025〕11 号),明确强调加快人工智能驱动的新型科研范式,加速“从 0 到 1”重大科学发现进程。

OpenAI 计划组建一个由顶尖学者组成的小型团队,这些学者需要满足三个条件:

  1. 在其研究领域达到世界级水准;
  2. 深度认同人工智能理念;
  3. 具备卓越的科学传播能力。
He Wang | ICTP-AP, UCAS
Towards Transparent AI in Gravitational Wave Data Analysis

这一系列举措表明,AI 驱动的科学发现正从学术探索迈向战略竞争新阶段。

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

"科学家"

"合作者"

"评估者"

Let's be honest about our motivations... 😉

AI and Cosmology: From Computational Tools to Scientific Discovery

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

Let's be honest about our motivations... 😉

直接不行?那就包装回炉再来一遍。

npj Artif. Intell. 1, 14 (2025).

"序列输出"

"序列输入"

Direct fails. Refine and recover.​

AI and Cosmology: From Computational Tools to Scientific Discovery

Demo: LLM 验证开普勒行星运动三定律 (2025.3)

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

Let's be honest about our motivations... 😉

AI and Cosmology: From Computational Tools to Scientific Discovery

Generative agents rely on predefined rules. 🤫

📄 Google DeepMind: "Scaling LLM Test-Time Compute Optimally" (arXiv:2408.03314)

🔗 OpenAI: Learning to Reason with LLMs

How can LLMs be used for scientific discovery?

He Wang | ICTP-AP, UCAS
Towards Transparent AI in Gravitational Wave Data Analysis

Uncovering the "black box" to reveal how LLM actually works

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

AI and Cosmology: From Computational Tools to Scientific Discovery

GPT能力的演变

对GPT-3.5能力的仔细审查揭示了其新兴能力的起源:

  • 原始的GPT-3通过预训练获得了生成能力、世界知识和上下文学习
  • 经过指令调优的模型发展出了遵循指令和推广到未见任务的能力
  • 代码训练模型(code-davinci-002)获得了代码理解能力
  • 进行复杂推理的能力可能是代码训练的副产品

What are our thoughts on LLMs?

GPT-3.5 series [Source: University of Edinburgh, Allen Institute for AI]

GPT-3 (2020)

ChatGPT (2022)

Magic: Code + Text

究竟是什么让 LLMs 如此强大

Code!  (1/3)

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

AI and Cosmology: From Computational Tools to Scientific Discovery

What are our thoughts on LLMs?

究竟是什么让 LLMs 如此颠覆

MCP

MCP Tool

prompt

"Please generate gw templates first."

模型上下文协议(Model Context Protocol,MCP),是由Anthropic推出的开源协议,旨在实现大语言模型与外部数据源和工具的集成,用来在大模型和数据源之间建立安全双向的连接。

Demo: GW150914 MCP Signal Search

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

AI and Cosmology: From Computational Tools to Scientific Discovery

What are our thoughts on LLMs?

Rule-Based Vs. LLMs: (Source)

Natural Language Programming!  (2/3)

究竟是什么让 LLMs 如此颠覆

MCP

RAG

具身智能

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

AI and Cosmology: From Computational Tools to Scientific Discovery

What are our thoughts on LLMs?

It's Mere Interpolation​!  (3/3)

究竟如何解释 AI/LLMs 的原理

The core driving force of AI4Sci largely lies in its “interpolation” generalization capabilities, showcasing its powerful complex modeling abilities.

Deep Learning is Not As Impressive As you Think, It's Mere Interpolation (Source)

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

AI and Cosmology: From Computational Tools to Scientific Discovery

What are our thoughts on LLMs?

It's Mere Interpolation​!  (3/3)

究竟如何解释 AI/LLMs 的原理

The core driving force of AI4Sci largely lies in its “interpolation” generalization capabilities, showcasing its powerful complex modeling abilities.

Deep Learning is Not As Impressive As you Think, It's Mere Interpolation (Source)

Representation Space Interpolation

Representation Space Interpolation

He Wang | ICTP-AP, UCAS

The "Real" Reasons We Apply LLMs to Scientific Discovery

AI and Cosmology: From Computational Tools to Scientific Discovery

What are our thoughts on LLMs?

It's Mere Interpolation​!  (3/3)

究竟如何解释 AI/LLMs 的原理

Deep Learning is Not As Impressive As you Think, It's Mere Interpolation (Source)

  • 【信号搜索】通过对超出广相的BGR波形进行测试,框架在不同PN阶和光度距离下均展现出与GR信号检测相媲的通用化能力和鲁棒性。

GR

BGR

Yu-Xin Wang, Xiaotong Wei, Chun-Yue Li, Tian-Yang Sun, Shang-Jie Jin, He Wang*, Jing-Lei Cui, Jing-Fei Zhang, and Xin Zhang*. “Search for Exotic Gravitational Wave Signals beyond General Relativity Using Deep Learning.” PRD 112 (2), 024030. e-Print: arXiv:2410.20129 [gr-qc]

  • 【宇宙学】CVAE模型能将CMB功率谱谱高效压缩到仅5个潜在维度,并在Planck不确定性内实现了超过99.9%的高保真度重建,即使在参数外推下也能可靠重构。

~ sampling

Tian-Yang Sun, Tian-Nuo Li, He Wang*, Jing-Fei Zhang, Xin Zhang*. Conditional variational autoencoders for cosmological model discrimination and anomaly detection in cosmic microwave background power spectra. e-Print: arxiv:2510.27086 [astro-ph.CO]

通过设计非线性映射,将科学数据表示到多样化的特征空间中,以提升对复杂科学问题的建模与推断能力。

《纽伦堡编年史》上这些圆圈并不是随意涂写,而是某位古代读者试图调和《七十士译本》(希腊旧约)与《希伯来圣经》两种不同年代计算体系所做的笔记。

AI 与考古:

Cap Set Problem

  • 给定一个N,求维度为N的网格里面最大能找到多少个点,这些点中任意三个点都不能连成一条直线。

Bin Packing Problem

  • 如何在线将不同尺寸的物品装入最少数量的箱子中。

The largets cap set in N=2 has size 4.

The largest cap set in N=3 has size 9 > \(2^3\)

For N > 6, the size of the largest cap set is unknown.

Discover new knowledge and efficient algorithms using AI

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

Illustrative example of bin packing using existing heuristic – Best-fit heuristic (left), and using a heuristic discovered by FunSearch (right).

DeepMind Blog (Source)

LLM guided search in ”program“ space

Discover new knowledge and efficient algorithms using AI

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

LLM guided search in ”program“ space

Real-world Case: FunSearch (Nature, 2023)

  • Google DeepMind's FunSearch system pairs LLMs with evaluators in an evolutionary process
  • Discovered new mathematical knowledge for the cap set problem in combinatorics, improving on best known bounds
  • Also created novel algorithms for online bin packing that outperform traditional methods
  • Demonstrates LLMs can make verifiable scientific discoveries beyond their training data

YouTube (Source)

Discover new knowledge and efficient algorithms using AI

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

YouTube (Source)

LLM guided search in ”program“ space

Discover new knowledge and efficient algorithms using AI

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

LLM guided search in ”program“ space

Evaluation for MLGWSC-1 benchmark

Concept

Mechanism

problem → algorithm

data → algorithm → reward
↺ LLM-guided algorithm updates

LLM as designer

external_knowledge
(constraint)

from problem-solving to algorithm discovery

HW, LZ. arXiv:2508.03661 [cs.AI]

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

LLMs act as policies over algorithms, not predictors of data.

external_knowledge
(constraint)

Evaluation for MLGWSC-1 benchmark

LLM as designer

import numpy as np
import scipy.signal as signal
def pipeline_v1(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    def data_conditioning(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        window_length = 4096
        dt = times[1] - times[0]
        fs = 1.0 / dt
        
        def whiten_strain(strain):
            strain_zeromean = strain - np.mean(strain)
            freqs, psd = signal.welch(strain_zeromean, fs=fs, nperseg=window_length,
                                       window='hann', noverlap=window_length//2)
            smoothed_psd = np.convolve(psd, np.ones(32) / 32, mode='same')
            smoothed_psd = np.maximum(smoothed_psd, np.finfo(float).tiny)
            white_fft = np.fft.rfft(strain_zeromean) / np.sqrt(np.interp(np.fft.rfftfreq(len(strain_zeromean), d=dt), freqs, smoothed_psd))
            return np.fft.irfft(white_fft)

        whitened_h1 = whiten_strain(strain_h1)
        whitened_l1 = whiten_strain(strain_l1)
        
        return whitened_h1, whitened_l1, times
    
    def compute_metric_series(h1_data: np.ndarray, l1_data: np.ndarray, time_series: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        fs = 1 / (time_series[1] - time_series[0])
        f_h1, t_h1, Sxx_h1 = signal.spectrogram(h1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        f_l1, t_l1, Sxx_l1 = signal.spectrogram(l1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        tf_metric = np.mean((Sxx_h1**2 + Sxx_l1**2) / 2, axis=0)
        gps_mid_time = time_series[0] + (time_series[-1] - time_series[0]) / 2
        metric_times = gps_mid_time + (t_h1 - t_h1[-1] / 2)
        
        return tf_metric, metric_times

    def calculate_statistics(tf_metric, t_h1):
        background_level = np.median(tf_metric)
        peaks, _ = signal.find_peaks(tf_metric, height=background_level * 1.0, distance=2, prominence=background_level * 0.3)
        peak_times = t_h1[peaks]
        peak_heights = tf_metric[peaks]
        peak_deltat = np.full(len(peak_times), 10.0)  # Fixed uncertainty value
        return peak_times, peak_heights, peak_deltat

    whitened_h1, whitened_l1, data_times = data_conditioning(strain_h1, strain_l1, times)
    tf_metric, metric_times = compute_metric_series(whitened_h1, whitened_l1, data_times)
    peak_times, peak_heights, peak_deltat = calculate_statistics(tf_metric, metric_times)
    
    return peak_times, peak_heights, peak_deltat

Optimization Target: Maximizing Area Under Curve (AUC) in the 1-1000Hz false alarms per-year range, balancing detection sensitivity and false alarm rates across algorithm generations

MLGWSC-1 benchmark

HW, LZ. arXiv:2508.03661 [cs.AI]

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

LLMs act as policies over algorithms, not predictors of data.

external_knowledge
(constraint)

PyCBC (linear-core)

cWB (nonlinear-core)

Simple filters (non-linear)

CNN-like (highly non-linear)

Benchmarking against state-of-the-art methods

Evaluation for MLGWSC-1 benchmark

LLM as designer

import numpy as np
import scipy.signal as signal
def pipeline_v1(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    def data_conditioning(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        window_length = 4096
        dt = times[1] - times[0]
        fs = 1.0 / dt
        
        def whiten_strain(strain):
            strain_zeromean = strain - np.mean(strain)
            freqs, psd = signal.welch(strain_zeromean, fs=fs, nperseg=window_length,
                                       window='hann', noverlap=window_length//2)
            smoothed_psd = np.convolve(psd, np.ones(32) / 32, mode='same')
            smoothed_psd = np.maximum(smoothed_psd, np.finfo(float).tiny)
            white_fft = np.fft.rfft(strain_zeromean) / np.sqrt(np.interp(np.fft.rfftfreq(len(strain_zeromean), d=dt), freqs, smoothed_psd))
            return np.fft.irfft(white_fft)

        whitened_h1 = whiten_strain(strain_h1)
        whitened_l1 = whiten_strain(strain_l1)
        
        return whitened_h1, whitened_l1, times
    
    def compute_metric_series(h1_data: np.ndarray, l1_data: np.ndarray, time_series: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        fs = 1 / (time_series[1] - time_series[0])
        f_h1, t_h1, Sxx_h1 = signal.spectrogram(h1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        f_l1, t_l1, Sxx_l1 = signal.spectrogram(l1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        tf_metric = np.mean((Sxx_h1**2 + Sxx_l1**2) / 2, axis=0)
        gps_mid_time = time_series[0] + (time_series[-1] - time_series[0]) / 2
        metric_times = gps_mid_time + (t_h1 - t_h1[-1] / 2)
        
        return tf_metric, metric_times

    def calculate_statistics(tf_metric, t_h1):
        background_level = np.median(tf_metric)
        peaks, _ = signal.find_peaks(tf_metric, height=background_level * 1.0, distance=2, prominence=background_level * 0.3)
        peak_times = t_h1[peaks]
        peak_heights = tf_metric[peaks]
        peak_deltat = np.full(len(peak_times), 10.0)  # Fixed uncertainty value
        return peak_times, peak_heights, peak_deltat

    whitened_h1, whitened_l1, data_times = data_conditioning(strain_h1, strain_l1, times)
    tf_metric, metric_times = compute_metric_series(whitened_h1, whitened_l1, data_times)
    peak_times, peak_heights, peak_deltat = calculate_statistics(tf_metric, metric_times)
    
    return peak_times, peak_heights, peak_deltat

HW, LZ. arXiv:2508.03661 [cs.AI]

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

LLMs act as policies over algorithms, not predictors of data.

external_knowledge
(constraint)

PyCBC (linear-core)

cWB (nonlinear-core)

Simple filters (non-linear)

CNN-like (highly non-linear)

Benchmarking against state-of-the-art methods

Evaluation for MLGWSC-1 benchmark

LLM as designer

import numpy as np
import scipy.signal as signal
def pipeline_v1(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    def data_conditioning(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        window_length = 4096
        dt = times[1] - times[0]
        fs = 1.0 / dt
        
        def whiten_strain(strain):
            strain_zeromean = strain - np.mean(strain)
            freqs, psd = signal.welch(strain_zeromean, fs=fs, nperseg=window_length,
                                       window='hann', noverlap=window_length//2)
            smoothed_psd = np.convolve(psd, np.ones(32) / 32, mode='same')
            smoothed_psd = np.maximum(smoothed_psd, np.finfo(float).tiny)
            white_fft = np.fft.rfft(strain_zeromean) / np.sqrt(np.interp(np.fft.rfftfreq(len(strain_zeromean), d=dt), freqs, smoothed_psd))
            return np.fft.irfft(white_fft)

        whitened_h1 = whiten_strain(strain_h1)
        whitened_l1 = whiten_strain(strain_l1)
        
        return whitened_h1, whitened_l1, times
    
    def compute_metric_series(h1_data: np.ndarray, l1_data: np.ndarray, time_series: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        fs = 1 / (time_series[1] - time_series[0])
        f_h1, t_h1, Sxx_h1 = signal.spectrogram(h1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        f_l1, t_l1, Sxx_l1 = signal.spectrogram(l1_data, fs=fs, nperseg=256, noverlap=128, mode='magnitude', detrend=False)
        tf_metric = np.mean((Sxx_h1**2 + Sxx_l1**2) / 2, axis=0)
        gps_mid_time = time_series[0] + (time_series[-1] - time_series[0]) / 2
        metric_times = gps_mid_time + (t_h1 - t_h1[-1] / 2)
        
        return tf_metric, metric_times

    def calculate_statistics(tf_metric, t_h1):
        background_level = np.median(tf_metric)
        peaks, _ = signal.find_peaks(tf_metric, height=background_level * 1.0, distance=2, prominence=background_level * 0.3)
        peak_times = t_h1[peaks]
        peak_heights = tf_metric[peaks]
        peak_deltat = np.full(len(peak_times), 10.0)  # Fixed uncertainty value
        return peak_times, peak_heights, peak_deltat

    whitened_h1, whitened_l1, data_times = data_conditioning(strain_h1, strain_l1, times)
    tf_metric, metric_times = compute_metric_series(whitened_h1, whitened_l1, data_times)
    peak_times, peak_heights, peak_deltat = calculate_statistics(tf_metric, metric_times)
    
    return peak_times, peak_heights, peak_deltat

HW, LZ. arXiv:2508.03661 [cs.AI]

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

LLMs act as policies over algorithms, not predictors of data.

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

You are an expert in gravitational wave signal detection algorithms. Your task is to design heuristics that can effectively solve optimization problems.

{prompt_task}

I have analyzed two algorithms and provided a reflection on their differences. 

[Worse code]
{worse_code}

[Better code]
{better_code}

[Reflection]
{reflection}

{external_knowledge}

Based on this reflection, please write an improved algorithm according to the reflection. 
First, describe the design idea and main steps of your algorithm in one sentence. The description must be inside a brace outside the code implementation. Next, implement it in Python as a function named '{func_name}'.
This function should accept {input_count} input(s): {joined_inputs}. The function should return {output_count} output(s): {joined_outputs}. 
{inout_inf} {other_inf}

Do not give additional explanations.

One Prompt Template for MLGWSC1 Algorithm Synthesis

LLM as designer

external_knowledge
(constraint)

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

LLMs act as policies over algorithms, not predictors of data.

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

LLM as designer

external_knowledge
(constraint)

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

LLMs act as policies over algorithms, not predictors of data.

Algorithmic Synergy: MCTS, Evolution & LLM Agents

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

h

蒙特卡洛树搜索 (MCTS)

Casse1: Go Game

Case 2: OpenAI Strawberry (o1)

o1 的发布,标志着推理时间扩展(inference-time scaling)范式正式应用于生产环境。正如Sutton在《The Bitter Lesson》中指出,只有学习和搜索两种技术能随计算能力无限扩展。自此开始重点转向搜索了。

Browne et al. (2012)

蒙特卡洛树搜索(MCTS)结合随机模拟与树搜索优化决策,(一直)都是现代博弈程序(如AlphaGo)的核心技术。

LLM-Informed Evo-MCTS

Algorithmic Synergy: MCTS, Evolution & LLM Agents

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

h
h
  • Within each evolutionary iteration, Monte Carlo Tree Search (MCTS) decomposes complex signal detection problems into manageable decision sequences, enabling depth-wise and path-wise exploration of algorithmic possibilities.
  • We propose four evolutionary operations for MCTS expansion: Parent Crossover (PC) combines information from nodes at the parent level, Sibling Crossover (SC) exchanges features between nodes sharing the same parent, Point Mutation (PM) introduces random perturbations to individual nodes, and Path-wise Crossover (PWC) synthesizes information along complete trajectories from root to leaf.

LLM-Informed Evo-MCTS

进化算法 (EA)

进化算法(Evolutionary Algorithms, EAs)是一类模拟自然界生物进化机制(选择、交叉、变异)的启发式搜索算法。其主要优点包括:
1. 强大的全局搜索能力
2. 无需梯度信息(通用性强)
3. 良好的鲁棒性和自适应性
4. 并行性(计算效率高)
5. 适合处理多目标优化问题

LLM-Driven Algorithmic Evolution Through Reflective Code Synthesis.

Monte Carlo Tree Search (MCTS) Algorithmic Evolution Pathway

What changed?

  • LLMs propose actions that guide the search

  • Evaluations (fitness/likelihood/...) become reusable memory

  • deepseek-R1 for reflection generation
  • o3-mini-medium for code generation

HW, LZ. arXiv:2508.03661 [cs.AI]

When LLMs Enter the Algorithmic Loop

The LLM does not predict answers — it reshapes how we search for algorithms.

Search trajectories matter more than isolated optima.

Interpretability Analysis

Algorithmic Component Impact Analysis.

  • A comprehensive technique impact analysis using controlled comparative methodology
import numpy as np
import scipy.signal as signal
from scipy.signal.windows import tukey
from scipy.signal import savgol_filter

def pipeline_v2(strain_h1: np.ndarray, strain_l1: np.ndarray, times: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    The pipeline function processes gravitational wave data from the H1 and L1 detectors to identify potential gravitational wave signals.
    It takes strain_h1 and strain_l1 numpy arrays containing detector data, and times array with corresponding time points.
    The function returns a tuple of three numpy arrays: peak_times containing GPS times of identified events,
    peak_heights with significance values of each peak, and peak_deltat showing time window uncertainty for each peak.
    """
    eps = np.finfo(float).tiny
    dt = times[1] - times[0]
    fs = 1.0 / dt
    # Base spectrogram parameters
    base_nperseg = 256
    base_noverlap = base_nperseg // 2
    medfilt_kernel = 101       # odd kernel size for robust detrending
    uncertainty_window = 5     # half-window for local timing uncertainty

    # -------------------- Stage 1: Robust Baseline Detrending --------------------
    # Remove long-term trends using a median filter for each channel.
    detrended_h1 = strain_h1 - signal.medfilt(strain_h1, kernel_size=medfilt_kernel)
    detrended_l1 = strain_l1 - signal.medfilt(strain_l1, kernel_size=medfilt_kernel)

    # -------------------- Stage 2: Adaptive Whitening with Enhanced PSD Smoothing --------------------
    def adaptive_whitening(strain: np.ndarray) -> np.ndarray:
        # Center the signal.
        centered = strain - np.mean(strain)
        n_samples = len(centered)
        # Adaptive window length: between 5 and 30 seconds
        win_length_sec = np.clip(n_samples / fs / 20, 5, 30)
        nperseg_adapt = int(win_length_sec * fs)
        nperseg_adapt = max(10, min(nperseg_adapt, n_samples))
        
        # Create a Tukey window with 75% overlap.
        tukey_alpha = 0.25
        win = tukey(nperseg_adapt, alpha=tukey_alpha)
        noverlap_adapt = int(nperseg_adapt * 0.75)
        if noverlap_adapt >= nperseg_adapt:
            noverlap_adapt = nperseg_adapt - 1
        
        # Estimate the power spectral density (PSD) using Welch's method.
        freqs, psd = signal.welch(centered, fs=fs, nperseg=nperseg_adapt,
                                  noverlap=noverlap_adapt, window=win, detrend='constant')
        psd = np.maximum(psd, eps)
        
        # Compute relative differences for PSD stationarity measure.
        diff_arr = np.abs(np.diff(psd)) / (psd[:-1] + eps)
        # Smooth the derivative with a moving average.
        if len(diff_arr) >= 3:
            smooth_diff = np.convolve(diff_arr, np.ones(3)/3, mode='same')
        else:
            smooth_diff = diff_arr
        
        # Exponential smoothing (Kalman-like) with adaptive alpha using PSD stationarity.
        smoothed_psd = np.copy(psd)
        for i in range(1, len(psd)):
            # Adaptive smoothing coefficient: base 0.8 modified by local stationarity (±0.05)
            local_alpha = np.clip(0.8 - 0.05 * smooth_diff[min(i-1, len(smooth_diff)-1)], 0.75, 0.85)
            smoothed_psd[i] = local_alpha * smoothed_psd[i-1] + (1 - local_alpha) * psd[i]
            
        # Compute Tikhonov regularization gain based on deviation from median PSD.
        noise_baseline = np.median(smoothed_psd)
        raw_gain = (smoothed_psd / (noise_baseline + eps)) - 1.0
        
        # Compute a causal-like gradient using the Savitzky-Golay filter.
        win_len = 11 if len(smoothed_psd) >= 11 else ((len(smoothed_psd)//2)*2+1)
        polyorder = 2 if win_len > 2 else 1
        delta_freq = np.mean(np.diff(freqs))
        grad_psd = savgol_filter(smoothed_psd, win_len, polyorder, deriv=1, delta=delta_freq, mode='interp')
        
        # Nonlinear scaling via sigmoid to enhance gradient differences.
        sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x))
        scaling_factor = 1.0 + 2.0 * sigmoid(np.abs(grad_psd) / (np.median(smoothed_psd) + eps))
        
        # Compute adaptive gain factors with nonlinear scaling.
        gain = 1.0 - np.exp(-0.5 * scaling_factor * raw_gain)
        gain = np.clip(gain, -8.0, 8.0)
        
        # FFT-based whitening: interpolate gain and PSD onto FFT frequency bins.
        signal_fft = np.fft.rfft(centered)
        freq_bins = np.fft.rfftfreq(n_samples, d=dt)
        interp_gain = np.interp(freq_bins, freqs, gain, left=gain[0], right=gain[-1])
        interp_psd = np.interp(freq_bins, freqs, smoothed_psd, left=smoothed_psd[0], right=smoothed_psd[-1])
        denom = np.sqrt(interp_psd) * (np.abs(interp_gain) + eps)
        denom = np.maximum(denom, eps)
        white_fft = signal_fft / denom
        whitened = np.fft.irfft(white_fft, n=n_samples)
        return whitened

    # Whiten H1 and L1 channels using the adapted method.
    white_h1 = adaptive_whitening(detrended_h1)
    white_l1 = adaptive_whitening(detrended_l1)

    # -------------------- Stage 3: Coherent Time-Frequency Metric with Frequency-Conditioned Regularization --------------------
    def compute_coherent_metric(w1: np.ndarray, w2: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        # Compute complex spectrograms preserving phase information.
        f1, t_spec, Sxx1 = signal.spectrogram(w1, fs=fs, nperseg=base_nperseg,
                                              noverlap=base_noverlap, mode='complex', detrend=False)
        f2, t_spec2, Sxx2 = signal.spectrogram(w2, fs=fs, nperseg=base_nperseg,
                                               noverlap=base_noverlap, mode='complex', detrend=False)
        # Ensure common time axis length.
        common_len = min(len(t_spec), len(t_spec2))
        t_spec = t_spec[:common_len]
        Sxx1 = Sxx1[:, :common_len]
        Sxx2 = Sxx2[:, :common_len]
        
        # Compute phase differences and coherence between detectors.
        phase_diff = np.angle(Sxx1) - np.angle(Sxx2)
        phase_coherence = np.abs(np.cos(phase_diff))
        
        # Estimate median PSD per frequency bin from the spectrograms.
        psd1 = np.median(np.abs(Sxx1)**2, axis=1)
        psd2 = np.median(np.abs(Sxx2)**2, axis=1)
        
        # Frequency-conditioned regularization gain (reflection-guided).
        lambda_f = 0.5 * ((np.median(psd1) / (psd1 + eps)) + (np.median(psd2) / (psd2 + eps)))
        lambda_f = np.clip(lambda_f, 1e-4, 1e-2)
        # Regularization denominator integrating detector PSDs and lambda.
        reg_denom = (psd1[:, None] + psd2[:, None] + lambda_f[:, None] + eps)
        
        # Weighted phase coherence that balances phase alignment with noise levels.
        weighted_comp = phase_coherence / reg_denom
        
        # Compute axial (frequency) second derivatives as curvature estimates.
        d2_coh = np.gradient(np.gradient(phase_coherence, axis=0), axis=0)
        avg_curvature = np.mean(np.abs(d2_coh), axis=0)
        
        # Nonlinear activation boost using tanh for regions of high curvature.
        nonlinear_boost = np.tanh(5 * avg_curvature)
        linear_boost = 1.0 + 0.1 * avg_curvature
        
        # Cross-detector synergy: weight derived from global median consistency.
        novel_weight = np.mean((np.median(psd1) + np.median(psd2)) / (psd1[:, None] + psd2[:, None] + eps), axis=0)
        
        # Integrated time-frequency metric combining all enhancements.
        tf_metric = np.sum(weighted_comp * linear_boost * (1.0 + nonlinear_boost), axis=0) * novel_weight
        
        # Adjust the spectrogram time axis to account for window delay.
        metric_times = t_spec + times[0] + (base_nperseg / 2) / fs
        return tf_metric, metric_times

    tf_metric, metric_times = compute_coherent_metric(white_h1, white_l1)

    # -------------------- Stage 4: Multi-Resolution Thresholding with Octave-Spaced Dyadic Wavelet Validation --------------------
    def multi_resolution_thresholding(metric: np.ndarray, times_arr: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
        # Robust background estimation with median and MAD.
        bg_level = np.median(metric)
        mad_val = np.median(np.abs(metric - bg_level))
        robust_std = 1.4826 * mad_val
        threshold = bg_level + 1.5 * robust_std

        # Identify candidate peaks using prominence and minimum distance criteria.
        peaks, _ = signal.find_peaks(metric, height=threshold, distance=2, prominence=0.8 * robust_std)
        if peaks.size == 0:
            return np.array([]), np.array([]), np.array([])

        # Local uncertainty estimation using a Gaussian-weighted convolution.
        win_range = np.arange(-uncertainty_window, uncertainty_window + 1)
        sigma = uncertainty_window / 2.5
        gauss_kernel = np.exp(-0.5 * (win_range / sigma) ** 2)
        gauss_kernel /= np.sum(gauss_kernel)
        weighted_mean = np.convolve(metric, gauss_kernel, mode='same')
        weighted_sq = np.convolve(metric ** 2, gauss_kernel, mode='same')
        variances = np.maximum(weighted_sq - weighted_mean ** 2, 0.0)
        uncertainties = np.sqrt(variances)
        uncertainties = np.maximum(uncertainties, 0.01)

        valid_times = []
        valid_heights = []
        valid_uncerts = []
        n_metric = len(metric)

        # Compute a simple second derivative for local curvature checking.
        if n_metric > 2:
            second_deriv = np.diff(metric, n=2)
            second_deriv = np.pad(second_deriv, (1, 1), mode='edge')
        else:
            second_deriv = np.zeros_like(metric)

        # Use octave-spaced scales (dyadic wavelet validation) to validate peak significance.
        widths = np.arange(1, 9)  # approximate scales 1 to 8
        for peak in peaks:
            # Skip peaks lacking sufficient negative curvature.
            if second_deriv[peak] > -0.1 * robust_std:
                continue
            local_start = max(0, peak - uncertainty_window)
            local_end = min(n_metric, peak + uncertainty_window + 1)
            local_segment = metric[local_start:local_end]
            if len(local_segment) < 3:
                continue
            try:
                cwt_coeff = signal.cwt(local_segment, signal.ricker, widths)
            except Exception:
                continue
            max_coeff = np.max(np.abs(cwt_coeff))
            # Threshold for validating the candidate using local MAD.
            cwt_thresh = mad_val * np.sqrt(2 * np.log(len(local_segment) + eps))
            if max_coeff >= cwt_thresh:
                valid_times.append(times_arr[peak])
                valid_heights.append(metric[peak])
                valid_uncerts.append(uncertainties[peak])

        if len(valid_times) == 0:
            return np.array([]), np.array([]), np.array([])
        return np.array(valid_times), np.array(valid_heights), np.array(valid_uncerts)

    peak_times, peak_heights, peak_deltat = multi_resolution_thresholding(tf_metric, metric_times)
    return peak_times, peak_heights, peak_deltat
  • Automatically discover and interpret the value of nonlinear algorithms
  • Facilitating new knowledge production along with experience guidance

PT Level 5

HW, LZ. arXiv:2508.03661 [cs.AI]

Framework Mechanism Analysis

Integrated Architecture Validation

  • A comprehensive comparison of our integrated
    Evo-MCTS framework against its constituent components operating in isolation.
    • Evo-MCTS: MCTS + Self-evolve + Reflection mech.
    • MCTS-AHD: MCTS framework for CO.
    • ReEvo: evolutionary framework for CO.

Contributions of knowledge synthesis

  • Compare to w/o external knowledge
    • non-linear vs linear only

LLM Model Selection and Robustness Analysis

  • Ablation study of various LLM contributions (code generator) and their robustness.
    • o3-mini-medium
      o1-2024-12-17
      gpt-4o-2024-11-20
      claude-3-7-sonnet-20250219-thinking

59.1%

115%

HW, LZ. arXiv:2508.03661 [cs.AI]

Framework Mechanism Analysis

He Wang | ICTP-AP, UCAS

AI and Cosmology: From Computational Tools to Scientific Discovery

Contributions of knowledge synthesis

  • Compare to w/o external knowledge
    • non-linear vs linear only

59.1%

115%

59.1%

### External Knowledge Integration
1. **Non-linear** Processing Core Concepts:
    - Signal Transformation: 
        * Non-linear vs linear decomposition
        * Adaptive threshold mechanisms
        * Multi-scale analysis
    
    - Feature Extraction:
        * Phase space reconstruction
        * Topological data analysis
        * Wavelet-based detection
    
    - Statistical Analysis:
        * Robust estimators
        * Non-Gaussian processes
        * Higher-order statistics

2. Implementation Principles:
    - Prioritize adaptive over fixed parameters
    - Consider local vs global characteristics
    - Balance computational cost with accuracy

HW, LZ. arXiv:2508.03661 [cs.AI]

Interpretable AI Approach

The best of both worlds

Input

Physics-Informed
Algorithm

(High interpretability)

Output

Example: Evo-MCTS, AlphaEvolve

AI Model

Physics
Knowledge

Traditional Physics Approach

Input

Human-Designed Algorithm

(Based on human insight)

Output

Example: Matched Filtering, linear regression

Black-Box AI Approach

Input

AI Model

(Low interpretability)

Output

Examples: CNN, AlphaGo, DINGO

Data/
Experience

Data/
Experience

🎯 OUR WORK

Scientific discovery requires interpretability, not just performance.

Interpretable AI for Gravitational-Wave Discovery

Scientific discovery requires interpretability — not just performance.

Key Takeaways: ... against Symbolic Regression

vs

See:第五分会场, 5-104
17:50-18:00 孙天阳:基于符号回归的宇宙学模型优化

任何算法的设计问题都可被看作是一个优化问题

  • 引力波数据处理的很多中间流程,都可以看做是“算法优化”问题,如 滤波器设计、噪声建模、探测统计量构造等等
  • 理论物理和宇宙学等中的很多解析建模和“符号回归”等方法,也都可以看做是“算法优化”问题
    • 符号回归 vs 算法优化

 

 

 

 

 

 


 

 

 

  • Other Opt. Problem Egs:
    • AI-driven design of experiments. [Phys. Rev. X 15, 021012 (2025)]
    • RL design for multiple filters in LIGO control system. [Science (2025)]

Traditional Physics

✓ Fully interpretable
✗ Performance ceiling

Human-designed pipelines
Fixed heuristics

Examples:
Matched filtering
χ² tests

Black-box AI

✓ High performance
✗ Opaque decisions

End-to-end prediction
Model-centric learning

Examples:
CNNs, DINGO

Interpretable
Algorithmic Discovery

Algorithms as search objects
Physics-informed objectives

Performance:
Competitive with state-of-the-art
(MLGWSC-1 benchmark)

Example:
Evo-MCTS (this work)
AlphaEvolve

Interpretable AI for Gravitational-Wave Discovery

Scientific discovery requires interpretability — not just performance.

AI should help us understand why an algorithm works — not just output an answer.

PyCBC (linear-core)

cWB (nonlinear-core)

Simple filters (non-linear)

CNN-like (highly non-linear)

Benchmarking against state-of-the-art methods
(MLGWSC1)

HW, LZ. arXiv:2508.03661 [cs.AI]

From Black-Box AI to Algorithmic Co-Design

LLMs as agents that optimize physics-based algorithms

A new axis: adaptivity over algorithm design

LLMs allow us to search over algorithms, not just over parameters.

  • Numerical orbits (of Taiji)
  • Unequal-arm
  • TDI-2.0

MH Du+, arXiv:2505.16500 [gr-qc]

The Global Fitting Challenge

Analyze Non-overlapping Galactic Binaries

The analysis of the best currently known LISA binaries, even making maximal use of the available information about the sources, is susceptible to ambiguity or biases when not simultaneously fitting to the rest of the galactic population.          (copied from Littenberg et al. 2404.03046)

credit: Karnesis et al, arXiv:2303.02164v2

credit: M. Katz, The 15th International LISA Symposium

  • Numerical orbits (of Taiji)
  • Unequal-arm
  • TDI-2.0

preliminary

preliminary

MH Du+, arXiv:2505.16500 [gr-qc]

MDP Element GW Interpretation
State Residuals, PSD drift, candidate list
Action Propose/subtract/refine/allocate compute
Reward Evidence gain, residual stationarity
Horizon Entire observing run

A trajectory tree of global-fitting decisions over time

Nodes: residual states
Edges: modeling actions

(HW+, in preparation)

Global fitting as a Markov Decision Process (MDP)

A RL Perspective on Global Fitting

Global fitting is not a single inference — it is a long-horizon control problem.

I don’t claim this is solved. I claim the framing matters.”

preliminary

  • Numerical orbits (of Taiji)
  • Unequal-arm
  • TDI-2.0

MH Du+, arXiv:2505.16500 [gr-qc]

A trajectory tree of global-fitting decisions over time

Nodes: residual states
Edges: modeling actions

(HW+, in preparation)

A RL Perspective on Global Fitting

Global fitting is not a single inference — it is a long-horizon control problem.

I don’t claim this is solved. I claim the framing matters.”

See:第八分会场, 5-105
18:00-18:10 张丁锴:Learning Null Channels for Instrumental Noise Characterization in Taiji

preliminary

preliminary

GW Data Analysis as a Markov Decision Process

Many GW pipelines already define an MDP — implicitly and inconsistently.

“Once you phrase the problem this way,
RL and MCTS are not exotic — they are obvious.”

Key Takeaways

  • What is the right reward for discovery?
  • When does adaptivity beat optimality?
for _ in range(num_of_audiences):
    print('Thank you for your attention! 🙏')

空间引力波探测数据仿真及分析春季学习班通知
(第一轮)

空间引力波探测已被列入《国家空间科学中长期发展规划(2024—2050 年)》“时空涟漪”主题下的优先发展方向。为加速形成我国自主的空间引力波探测数据仿真与分析系统,支撑科学目标论证和科学应用系统建设,挖掘和培养青年科研队伍,“空间引力波探测数据仿真及分析春季学习班”将于 2026 年 4 月 23 日-27 日在中国科学院微小卫星创新研究院举办。

  • 课程大纲:空间引力波探测系统仿真(轨道动力学、无拖曳控制、光学、激光链路、引力波信号响应等)| 引力波数据预处理(时间延迟干涉等) | 空间引力波探测目标波源分析(大质量双黑洞、恒星级双黑洞、极端质量比旋近系统、银河系致密双星)|人工智能数据分析方法 | 引力波模型及引力波物理前沿
  • 开班地点:中国科学院微小卫星创新研究院(张江园区) + 腾讯会议
  • 课程咨询:杜明辉 duminghui@imech.ac.cn
  • 会务咨询:张江朋 18621820740;唐奇芹 18221006158

太极实验室 2026 年度“大学生创新实践训练计划’

(引力波数据分析与 AI for Science 方向)

中国科学院大学引力波宇宙太极实验室(北京)引力波数据分析与机器学习课题组长期面向全国高校学生开放科研训练机会,现面向全国优秀学生招募参加太极实验室2026年度“大学生创新实践训练计划”,本课题组致力于探索引力波天文学、数值模拟与人工智能技术的交叉研究,重点发展新一代 Al for Science 方法,用于解决复杂物理系统建模、信号处理与科学数据分析问题,欢迎对 引力波科学、人工智能算法与科学计算 充满兴趣的同学加入,在真实科研项目中接受系统训练,并参与国际前沿研究。

  • 此外,本实验室长期欢迎对相关研究方向感兴趣的同学联系咨询,本广告长期有效.

Open Questions for the Community

  • What is the right reward for discovery?
  • Should we train ensembles instead of curating them?
  • When does adaptivity beat optimality?
for _ in range(num_of_audiences):
    print('Thank you for your attention! 🙏')

Call for Speakers - MLA F2F @ March LVK 2026 (Pisa)

Just a gentle reminder that we’re collecting contributions for the Machine Learning Algorithms (MLA) section!

Evolutionary and Reinforcement Learning for GW Data Analysis: from Large Language Model to Global Fitting

By He Wang

Evolutionary and Reinforcement Learning for GW Data Analysis: from Large Language Model to Global Fitting

2026/4/19 @2026年引力年会(海南) | https://s.31url.cn/bRlrl9jl

  • 13