Changepoint Analysis Status

2025-10-27

What is Changepoint Analysis?

A family of computational techniques to find key points in time series data where the behavior of the data changes (hence the name "changepoint").

What is Changepoint Analysis?

Assumes that time-series data is drawn piecewise from stationary distributions.

Time

For convenience we'll also assume data are read at a consistent dt. (Not strictly needed, but math becomes a lot cleaner).

What is Changepoint Analysis?

Time

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Assumes that time-series data is drawn piecewise from stationary distributions.

Time

What is Changepoint Analysis?

Time

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

t_2^*

Assumes that time-series data is drawn piecewise from stationary distributions.

What is Changepoint Analysis?

Time

At certain key times \( \{t_i^*\} \), we switch between different distributions.

t_1^*

t_2^*

Assumes that time-series data is drawn piecewise from stationary distributions.

Time

t_1^*

t_2^*

Of course, we don't get to see what the distributions are or which distribution the points come from!

Time

t_1^*

t_2^*

Of course, we don't get to see what the distributions are or which distribution the points come from!

Time

t_1^*

t_2^*

Goal: Given unlabeled data points, infer the points \( \{t_i^*\} \) where the data switches between distributions.

Typical Approach to Changepoint Analysis

Packages like ruptures, Changepoints.jl, etc.

C. Truong, L. Oudre, N. Vayatis. Selective review of offline change point detection methods. Signal Processing, 167:107299, 2020.

Overall idea

Write down some goodness-of-fit function for a segment that starts at \(s\) and ends at \(t\).
Optimize over the possible \( \{t_i^*\} \) to find the best possible segmentation.

Stupidly simple \((s,t)\) cost function

c(s,t) = \sum_{i=s+1}^t \left(x_i - \bar{x}_{s,t} \right)^2

\bar{x}_{s,t} = \frac{\sum_{i=s+1}^t x_i}{t-s}

This implicitly assumes a Gaussian with fixed variance and unknown mean (it is the MLE likelihood for the estimator of such a distribution under the given data). Other likelihoods assume unknown mean/variance, Poisson distributions, etc.

c(s,t) = \sum_{i=s+1}^t \left(x_i - \bar{x}_{s,t} \right)^2

\bar{x}_{s,t} = \frac{\sum_{i=s+1}^t x_i}{t-s}

To find the overall cost of some proposed segmentation \( \{t_i\} \), we compute:

V(x, \{t_i\}) = \sum_{i=1}^{T+1} c(t_{i-1}, t_i) + \lambda |\{t_i\}|

Penalty term to prevent too many changepoints.

Example

Time

Example

Time

Example

Time

\bar{x}_{s,t}

Example

c(s,t) = \sum_{i=s+1}^t \left(x_i - \quad \ \ \ \right)^2

\bar{x}_{s,t}

Time

\bar{x}_{s,t}

Example

\sum_{i=1}^{T+1} c(t_{i-1}, t_i)

V(x, \{t_i\}) =

+ \lambda |\{t_i\}|

c(s,t) = \sum_{i=s+1}^t \left(x_i - \quad \ \ \ \right)^2

\bar{x}_{s,t}

Time

\bar{x}_{s,t}

Time

Example

Time

Example

Time

Example

\bar{x}_{s,t}

Time

Example

\sum_{i=1}^{T+1} c(t_{i-1}, t_i)

V(x, \{t_i\}) =

+ \lambda |\{t_i\}|

c(s,t) = \sum_{i=s+1}^t \left(x_i - \quad \ \ \ \right)^2

\bar{x}_{s,t}

Time

Example

Time

Use search algorithms (e.g. greedy binary segementation, dynamic programming, PELT) to find a "good" set of changepoints. Global optimality is often unfeasible.

How to select \( \lambda \)?

A work in progress.....

Bayesian Information Criterion has proven helpful to get a starting point, but tends to underestimate the number of changepoints in the true data.

Possible to target known mean transition rate in data.

However, since some transitions are not visible to detector (too short), this will necessarily overcount the number of "visible" transitions in the data.
Above point might not be a problem! As long as extra changepoints are placed at "reasonable" locations, maybe clustering will sort the extra changepoints out.

Comparison with HMM

	HMM	Changepoint
Inputs

Time

	HMM	Changepoint
Inputs

Time

	HMM	Changepoint
Inputs

Time

	HMM	Changepoint
Inputs	Photon data, kinetic scheme, and an emission model (latter two usually unknown parameters + initial guesses)	Data and an emission model

Time

	HMM	Changepoint
Inputs	Photon data, kinetic scheme, and an emission model (latter two usually unknown parameters + initial guesses)	Data and an emission model
Outputs (Phase 1)

1.0

3.0

5.0

Time

	HMM	Changepoint
Inputs	Photon data, kinetic scheme, and an emission model (latter two usually unknown parameters + initial guesses)	Data and an emission model
Outputs (Phase 1)	Kinetic Rates	Changepoints

1.0

3.0

5.0

Time

1.0

3.0

5.0

Time

	HMM	Changepoint
Inputs	Photon data, kinetic scheme, and an emission model (latter two usually unknown parameters + initial guesses)	Data and an emission model
Outputs (Phase 1)	Kinetic Rates	Changepoints
Outputs (Phase 2)

1.0

3.0

5.0

Time

	HMM	Changepoint
Inputs	Photon data, kinetic scheme, and an emission model	Data and an emission model
Outputs (Phase 1)	Kinetic Rates	Changepoints
Outputs (Phase 2)	~~Viterbi assignment of states to input~~	Assignment of changepoints to states

Current Status of Changepoint

Works great on Synthetic Data!

Synthetic process:

Assume that each state generates photons from a Poisson distribution with offset from background.
Poisson mean rates are selected randomly, while enforcing Rate(Low) < Rate(Med) < Rate(Hi).

Works great on Synthetic Data!

= Triggering Trajectory

= Recovered Trajectory

= States shorter than 0.25s

Works great on Synthetic Data!

= Triggering Trajectory

= Recovered Trajectory

= States shorter than 0.25s

Works great on Synthetic Data!

= Triggering Trajectory

= Recovered Trajectory

= States shorter than 0.25s

Works great on Synthetic Data!

= Triggering Trajectory

= Recovered Trajectory

= States shorter than 0.25s

Works great on Synthetic Data!

= Triggering Trajectory

= Recovered Trajectory

= States shorter than 0.25s

Works great on Synthetic Data!

= Triggering Trajectory

= Recovered Trajectory

= States shorter than 0.25s

Works great on Dataset 4!

But if we eyeball the photon data, the missing transitions seem totally unrecoverable.

More Finicky Datasets

Highly Correlated Traces

For each pair of traces \((i, j)\), compute the correlation between trace[i] and trace[j].

This gives us a square matrix consisting of all the pairwise correlations between different traces.

We can plot how often each correlation occurs to give us an idea of how much the different photon trajectories agree with each other.

(x-axis truncated for ease of viewing)

What's going on here?

In Dataset 02, there is a set of 67 traces where all traces in this set are correlated with each other at a coefficient of > 0.92.

Even though the correlations in Dataset 04 are on average much higher, the largest set correlated mutually at > 0.92 only has 9 traces.

The correlated traces in Dataset 02 are visible even when plotting the data with high transparency: each line in the correlated set stacks to form a darker line.

Why, in a dataset where most traces are uncorrelated or weakly correlated, is there a small subset of traces that are this strongly correlated?

Should I try to throw some of them out? Weigh them more heavily when doing analysis? Pretend that there's nothing special about them?

If we throw out the highly correlated traces, we can get rid of the peak of highly correlated traces....but now we have a bunch of uncorrelated data.

Filter by selecting connected components

Perfect 1.0 values caused by calculating self-correlations.

Why are most of our data uncorrelated? Should this be expected?

Synthetic Tests for Correlation

We can control the variance present in the synthetic data generation scheme by altering the ratio between background and signal.

In synthetic data generated so far, background is ~500k, while signal is +10-15k over background.

Leads to data and correlation plots like the following:

↑↑↑ Synthetic ↑↑↑

↓↓↓ Experimental (Dset 04) ↓↓↓

Synthetic data less correlated than experimental!

Increase background while keeping signal at +10k above background

BG = 500,000

BG = 1,500,000

Increase background while keeping signal at +10k above background

BG = 1,500,000

BG = 10,000,000

Increase background while keeping signal at +10k above background

BG = 10,000,000

BG = 100,000,000

Increase background while keeping signal at +10k above background

BG = 100,000,000

Dataset 02

Conclusion

Dataset 02 looks very similar to generation from a Poisson process where the background is 100,000,000 counts per interval and the signal is 100,010,000 counts per interval, a signal of one part in 1/10,000.

BG = 100,000,000

Dataset 02

DSet 02 is a signal with a mean of approximately 12,000 and a standard deviation of approximately 11,500.

How well do Changepoints work on these SNRs?

Basically Random

Randomness comes from state-assignment algorithm, which uses a randomly-permuted k-means to assign initial clusters.

Hypothesis: HMMs can still extract some information from this because they build probabilities over all possible trajectories.

Changepoint-based analysis first tries to establish a single, canonical trajectory (basically placing a delta function on the space of possible trajectories) and then inferring entropy production from that.

This idea is based on pretty much zero concrete evidence. So is it true?

Proposal: Stress-Test HMM Capabilities with Known Synthetic Data

Create state trajectories from various systems (e.g. two track, sfd, three state unidirectional, three state biased, three state unbiased).

Then noise them at different levels:

BG = 5e5, as done for current synthetic trajectories
BG = 3e6, where transitions may be lost in noise.
BG = 1e8, where data is mostly noise.

If HMMs can succeed in all three cases, then it is clearly much more capable than changepoints in high-noise environments.

Other ideas

Can BNP-step produce better changepoint assignments?

Probably not.

BUT it can give us posterior distributions over likely change points. Can we use this to compute our own ensemble entropy production?

First impression is that this seems like way too much computation.

Trajectory Generation Process

Choose a background value \( B \) between 450k and 700k.
Choose \( \lambda_\ell \) uniformly from \([9000, 11000]\)
Choose \( \delta_1, \delta_2 \) uniformly from \( [1000, 3000] \)
Set \( \lambda_1 = B + \lambda_\ell \)
Set \( \lambda_2 = \lambda_1 + \delta_1 \)
Set \( \lambda_3 = \lambda_2 + \delta_2 \)
Start at \(t = 0\) and for each \(dt = 0.25\):
- If system stays in level \( L \) for the entire timestep, sample a value from \( \text{Poisson}(\lambda_L) \).
- Otherwise: build \( \lambda_{new} \) by computing an average of all the states the system spends time in, weighted by how long it spends in each state and sample from \( \text{Poisson}(\lambda_{new})\)
Amplify the variance of this sample by perturbing it away from its expected mean (see next slide).
Subtract off background value \( B \)
Repeat for number of sample trajectories.

Note: this procedure requires a state trajectory already: we need to know what state the system is in at any given time.

Sampling a Sensor-Read Poisson Value

According to the instrument manual, there is an additional multiplicative noise factor which increases the variance of a steady-state signal by \( \sqrt{2} \) without affecting the mean.

To simulate this, since we know the actual mean of the distribution, subtract the observation from the mean and then multiply the difference appropriately, adding it back to the value.

function sample_scaled_poisson(p::Poisson)
	scale = sqrt(2)
    λ = mean(p)
    unscaled_sample = rand(p)
    delta = unscaled_sample - λ
    scaled_sample = λ + sqrt(scale) * delta
    return scaled_sample

    # Ensure non-negativity of sample, though
    # this should not be a problem in practice
    # for the scales we work at
    return max(scaled_sample, 0.0)
end

Changepoint Analysis Status

2025-10-27

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

What is Changepoint Analysis?

Goal: Given unlabeled data points, infer the points \( \{t_i^*\} \) where the data switches between distributions.

Typical Approach to Changepoint Analysis

Packages like ruptures, Changepoints.jl, etc.

Overall idea

Stupidly simple \((s,t)\) cost function

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

How to select \( \lambda \)?

Comparison with HMM

Current Status of Changepoint

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Synthetic Data!

Works great on Dataset 4!

More Finicky Datasets

Highly Correlated Traces

Why are most of our data uncorrelated? Should this be expected?

Synthetic Tests for Correlation

We can control the variance present in the synthetic data generation scheme by altering the ratio between background and signal.

↑↑↑ Synthetic ↑↑↑

↓↓↓ Experimental (Dset 04) ↓↓↓

Increase background while keeping signal at +10k above background

Increase background while keeping signal at +10k above background

Increase background while keeping signal at +10k above background

Increase background while keeping signal at +10k above background

Conclusion

How well do Changepoints work on these SNRs?

Hypothesis: HMMs can still extract some information from this because they build probabilities over all possible trajectories.

This idea is based on pretty much zero concrete evidence. So is it true?

Proposal: Stress-Test HMM Capabilities with Known Synthetic Data

Other ideas

Trajectory Generation Process

Sampling a Sensor-Read Poisson Value

Changepoint Analysis

More from Kevin Song