Machine Learning for Time Series Analysis VI

Anomaly Detection and Point of Change Analysis

Fall 2025 - UDel PHYS 661
dr. federica bianco

@fedhere

fbianco@udel.edu

this slide deck:

https://slides.com/federicabianco/mltsa25_06

Goodness of fit metrics
- L1
- L2
- Chi2
- MSE

MLTSA:

Model Performance

Model selection:
- AIC
- BIC
- Minimum Information criteria
- Reduced chi2

Model performance
- Accuracy
- Precision
- Recall
- ROC curve

ML model performance

LR = _____________________________

True Negative

False Negative

	H0 is True	H0 is False
H0 is falsified	Type I Error False Positive	True Positive
H0 is not falsified	True Negative	Type II Error False Negative

Accuracy, Recall, Precision

ML model performance

LR = _____________________________

True Negative

False Negative

	H0 is True	H0 is False
H0 is falsified	Type I Error False Positive	True Positive
H0 is not falsified	True Negative	Type II Error False Negative

important message spammed

spam in

your inbox

Accuracy, Recall, Precision

MLTSA:

Anomaly detection

1

MLTSA:

Anomalies and Outliers

What is an outlier?

MLTSA:

Anomalies and Outliers

What is an outlier?

for now we'll focus on outlier events in time series: anomalies

MLTSA:

Anomalies and Outliers

literally it lies outside of the distribution.

The only problem is that generally I do now know what the distribution is....

What is an outlier?

MLTSA:

Anomalies and Outliers

literally it lies outside of the distribution.

The only problem is that generally I do now know what the distribution is....

generally I will assume that points that are "far" from the "core" of the distribution are outliers, but this definition betrays I have some belief about the distribution itself

What is an outlier?

MLTSA:

Anomalies and Outliers

point-wise detection (time points as outliers)

pattern-wise detection (subsequences as outliers)

system-wise detection (time series as outliers).

What is an outlier?

MLTSA:

Anomalies and Outliers

remember the definition of stochastic:

P(t) = P(t+l) for any l lag

Anomaly in a stochastic process

MLTSA:

Anomalies and Outliers

remember the definition of stochastic:

P(t) = P(t+l) for any l lag

if you believe you have a good model you can choose a frequentist approach and define a threshold corresponding to a p-value

Anomaly in a stochastic process

MLTSA:

Anomalies and Outliers

remember the definition of stochastic:

P(t) = P(t+l) for any l lag

Note also that a lot of your inference will depend on how you generate a distribution from your data: what is the right number of bins in a histogram?

If the distribution is not stochastic I recommend you look at adaptive binning, e.g. Bayesian Blocks

https://arxiv.org/pdf/1207.5578.pdf

This is an ideal framework for anomaly detection in time series where the time of arrival is the variable recorded

Anomaly in a stochastic process

MLTSA:

Anomalies and Outliers

what if you do not have a good model? what if the generative process is time dependent?

Model the process with a rolling (local) mean

Anomaly in a time-varying process

MLTSA:

Anomalies and Outliers

what if you do not have a good model? what if the generative process is time dependent?

Anomaly in a time-varying process

MLTSA:

Anomalies and Outliers

A global measure of the process does not capture anomalies in time-evolving processes

pl.plot(x, y)
m = y.mean()
s = y.std()
pl.plot(x, m, lw=3)
pl.plot(x, m + 3 * s, c='k')
pl.plot(x, m - 3 * s, c='k')

Anomaly in a time-varying process

MLTSA:

simple detection methods

2.1

MLTSA:

simple filtering for

time series

anomaly detection

MLTSA:

Anomalies and Outliers

what if you do not have a good model? what if the generative process is time dependent?

Model the process with a rolling (local) mean
Calculate the boundaries of "normality" as rolling mean +/- 3*rolling standard deviation

m = df['y'].rolling(window=10, center=True).mean()
df['y'].plot(lw=3)
df['m'].plot(lw=3)
(m + 3 * df['y'].rolling(window=20, center=True).std()).plot(c='k')
(m - 3 * df['y'].rolling(window=20, center=True).std()).plot(c='k')

in this case its still not enough!

Anomaly in a time-varying process

MLTSA:

Anomalies and Outliers

what if you do not have a good model? what if the generative process is time dependent?

Model the process with a rolling (local) mean
Calculate the boundaries of "normality" as rolling mean +/- 3*rolling standard deviation removing the center point

Anomaly in a time-varying process

m = df['y'].rolling(window=11, 
         center=True).apply(lambda x: 
         np.mean(np.concatenate([x[:5], x[6:]])))
s = df['y'].rolling(window=11, 
         center=True).apply(lambda x: 
         np.std(np.concatenate([x[:5], x[6:]])))
(m + 3 * s).plot(c='k', alpha=0.5)
(m - 3 * s).plot(c='k', alpha=0.5)

This works well for single point anomalies

MLTSA:

Anomalies and Outliers

what if you do not have a good model? what if the generative process is time dependent?

Model the process with a rolling (local) mean
Calculate the boundaries of "normality" as rolling mean +/- 3*rolling standard deviation removing the center point

Anomaly in a time-varying process

This works well for single point anomalies

Remember that at 3-sigma you expect 3 detections in 1000 measurements

MLTSA:

Bayesian outliers detection

2.2

MLTSA:

formulating the

likelihood for MCMC

https://github.com/fedhere/MLTSA_FBianco/blob/main/CodeExamples/MCMCOutliers.ipynb

MLTSA:

Simple case: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

def lnlike(theta, x, y, yerr):
    m, b, Yb = theta
    #line fit model
    model = m * x + b
    #variance of data
    sig2 = yerr**2
     #normalization: this is importnat because we have 2 linearly combined pieces of model
    den = 2 * np.pi * sig2
    #this is the probability that the point comes from the line
    a = 1 / np.sqrt(den) *\
        exp(-(y-model)**2 /  2.0 / sig2)  
    return  np.sum(np.log(a))

MLTSA:

Simple case: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

def lnprior(theta):
    '''
    logprior on the parameters theta
    theta: 5 parameter vector: slpoe, intercept
    '''
    m, b = theta
    if -200 < b < 500 and 0 < m < 10.0 :
        return 0.0
    return -np.inf
def lnprob(theta, x, y, yerr):
    ''' log likelihood * log prior: the posterior
    '''
    lp = lnprior(theta)
    
    if not np.isfinite(lp) :
        return -np.inf
    lnl = lnlike(theta, x, y, yerr)
    if  np.isnan(lnl):
        return -np.inf
    return lp + lnl

MLTSA:

Simple model: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

MLTSA:

Upgrade: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

plus another Gaussian process that can generate some points

Fitting a straight line to data,

https://arxiv.org/pdf/1008.4686.pdf

Hogg, Bovy, Lang 2010,

Section 3, Pruning outliers

in the presence of more than one generative process, one which generates inliers one that generates outliers, each point has a probability of being generated by either process!

Key Concept

MLTSA:

Upgrade: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

plus another Gaussian process that can generate some points

def lnlike(theta, x, y, yerr):
    m, b, Yb, Pb, V = theta
    #line fit model
    model = m * x + b
    #variance of data
    sig2 = yerr**2
     #normalization: this is importnat because 
     #we have 2 linearly combined pieces of model
    den = 2 * np.pi * sig2
    #this is the probability that the point comes from the line
    a = (1 - Pb) / np.sqrt(den) *\
        exp(-(y-model)**2 /  2.0 / sig2)  
    #this is the probability that it does not
    b = Pb / np.sqrt(den + 2*np.pi*V) *\
        exp (-(y - Yb)**2 / 2 / (V + sig2))
    return  np.sum(np.log(a + b))

Pb probability of outlier

pfg, probability distribution of the foreground model (inliers)

pbg probability distribution of the background model (outliers)

Fitting a straight line to data,

https://arxiv.org/pdf/1008.4686.pdf

MLTSA:

Upgrade: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

plus another Gaussian process that can generate some points

def lnlike(theta, x, y, yerr):
    m, b, Yb, Pb, V = theta
    #line fit model
    model = m * x + b
    #variance of data
    sig2 = yerr**2
     #normalization: this is importnat because we have 2 linearly combined pieces of model
    den = 2 * np.pi * sig2
    #this is the probability that the point comes from the line
    a = (1 - Pb) / np.sqrt(den) *\
        exp(-(y-model)**2 /  2.0 / sig2)  
    #this is the probability that it does not
    b = Pb / np.sqrt(den + 2*np.pi*V) *\
        exp (-(y - Yb)**2 / 2 / (V + sig2))
    return  np.sum(np.log(a + b))

def lnprior(theta):
    '''
    logprior on the parameters theta
    theta: 5 parameter vector: slpoe, intercept, 
                                Yb mean of the process that creates the outliers,
                                Pb probability that a point is an outlier,
                                V the variance of the process that generates outliers
    '''
    m, b, Yb, Pb, V = theta
    if -200 < b < 500 and 0 < m < 10.0 :
        #Pb is a probability so it is bound to 0-1
        if Pb < 0 or Pb > 1:
            return -np.inf
        # set some constraints on the mean of the process that creates the outliers
        if  Yb > ymean + 150 or Yb < ymean - 150:
            return -np.inf

        if V < 0:
            return -np.inf
        #print("3")
        return 0.0
    
    return -np.inf

def lnprob(theta, x, y, yerr):
    ''' log likelihood * log prior: the posterior
    '''
    lp = lnprior(theta)
    
    if not np.isfinite(lp) :
        return -np.inf
    lnl = lnlike(theta, x, y, yerr)
    if  np.isnan(lnl):
        return -np.inf
    return lp + lnl

MLTSA:

Upgrade: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

plus another Gaussian process that can generate some points

new model parameters

simple model parameters

MLTSA:

Upgrade: we believe our generative process is a line

y=mx+b

and some additive noise

e~N(μ,σ)

plus another Gaussian process that can generate some points

MLTSA:

Getting the probability that each point is an outlier

(exercise!)

MLTSA:

Monte Carlo Markov Chain

stochastic

"markovian"

sequence

posterior: joint probability distributin of a parameter set (m, b) conditioned upon some data D and a model hypothesys f

MLTSA:

MCMC

Goal: sample the posterior distribution

slope

intercept

slope

Goal: sample the posterior distribution

MLTSA:

MCMC

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr < 1:

current = new

ELSE:

//probabilistic step: accept with probability pnew/pcurr

draw a random number r ૯U[0,1]

IF r > pnew/pcurr >:
current = new

ELSE:

pass // do nothing

stochasticity allows us to explore the whole surface but spend more time in interesting spots

Goal: sample the posterior distribution

MLTSA:

MCMC

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr > 1:

current = new

ELSE:

Questions:

1. how do I choose the next point?

Any Markovian ergodic process

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr > 1:

current = new

ELSE:

//probabilistic step: accept with probability pnew/pcurr

draw a random number r ૯U[0,1]

IF r > pnew/pcurr >:
current = new

ELSE:

pass // do nothing

MLTSA:

MCMC

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr > 1:

current = new

ELSE:

//probabilistic step: accept with probability pnew/pcurr

draw a random number r ૯U[0,1]

IF r > pnew/pcurr >:
current = new

ELSE:

pass // do nothing

A process is Markovian if the next state of the system is determined stochastically as a perturbation of the current state of the system, and only the current state of the system, i.e. the system has no memory of earlier states (a memory-less process).

A Markovian Process

Definition

Ergodic Process

(given enough time) the entire parameter space would be sampled.

At equilibrium, each elementary process should be equilibrated by its reverse process.

reversible Markov process

Detailed Balance is a sufficient condition for ergodicity

\pi (x_1)P(x_2 | x_1)=\pi (x_2)P(x_1 | x_2)

Metropolis Rosenbluth Rosenbluth Teller 1953 - Hastings 1970

Definition

it can be shown that

If the chains are a Markovian Ergodic process

the algorithm is guaranteed to explore the entire likelihood surface given infinite time

This is in contrast to gradient descent, which can get stuck in local minima or in local saddle points.

how to choose the next point

how you make this decision names the algorithm

simulated annealing (good for multimodal)

parallel tempering (good for multimodal)
differential evolution (good for covariant spaces)

Gibbs sampling (move in along one variable at a time)

MLTSA:

MCMC

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr > 1:

current = new

ELSE:

//probabilistic step: accept with probability pnew/pcurr

draw a random number r ૯U[0,1]

IF r > pnew/pcurr >:
current = new

ELSE:

pass // do nothing

The chains: the algorithm creates a "chain" (a random walk) that "explores" the likelihood surface.

More efficient is to run many parallel chains - each exploring the surface, an "ensemble"

The path of the chains can be shown along each feature

MLTSA:

MCMC

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr > 1:

current = new

ELSE:

//probabilistic step: accept with probability pnew/pcurr

draw a random number r ૯U[0,1]

IF r > pnew/pcurr >:
current = new

ELSE:

pass // do nothing

step

feature value

Examples of how to choose the next point

affine invariant : EMCEE package

1

MLTSA:

MCMC

choose a starting point in the parameter space: current = θ0 = (m0,b0)

WHILE convergence criterion is met:

calculate the current posterior pcurr = P(D|θ0,f)

//proposal

choose a new set of parameters new = θnew = (mnew,bnew)

calculate the proposed posterior pnew = P(D|θnew,f)

IF pnew/pcurr > 1:

current = new

ELSE:

//probabilistic step: accept with probability pnew/pcurr

draw a random number r ૯U[0,1]

IF r > pnew/pcurr >:
current = new

ELSE:

pass // do nothing

step

feature value

MCMC convergence

check autocorrelation within a chain (Raftery)
check that all chains coverged to same region (a stationary distribution GelmanRubin)
mean at beginning = mean at end (Geweke)
check that entire chain reached stationary distribution (or a final fraction of the chain, Heidelberg-Welch using Cramer-von-Mises statistic)

MCMC convergence

check autocorrelation within a chain (Raftery)
check that all chains coverged to same region (a stationary distribution GelmanRubin)
mean at beginning = mean at end (Geweke)
check that entire chain reached stationary distribution (or a final fraction of the chain, Heidelberg-Welch using Cramer-von-Mises statistic)

MCMC convergence

check autocorrelation within a chain (Raftery)
check that all chains coverged to same region (a stationary distribution GelmanRubin)
mean at beginning = mean at end (Geweke)
check that entire chain reached stationary distribution (or a final fraction of the chain, Heidelberg-Welch using Cramer-von-Mises statistic)

MCMC convergence

check autocorrelation within a chain (Raftery)
check that all chains coverged to same region (a stationary distribution GelmanRubin)
mean at beginning = mean at end (Geweke)
check that entire chain reached stationary distribution (or a final fraction of the chain, Heidelberg-Welch using Cramer-von-Mises statistic)

MLTSA:

Point of change

3

MLTSA:

point of change

problem statement

detecting a change in the generative process:

the process is piecewise stationary

Key Concept

MLTSA:

point of change

problem statement

detecting a change in the generative process:

the process is piecewise stationary

examples:

Urban lights turn on-off transitions

to feed into energy demand models

and study urban life dynamics

MLTSA:

point of change

problem statement

detecting a change in the generative process:

the process is piecewise stationary

examples:

Urban lights turn on-off transitions

to feed into energy demand models

and study urban life dynamics

MLTSA:

point of change

problem statement

detecting a change in the generative process:

the process is piecewise stationary

examples:

Brain wave transitions as different phases of sleep alternate

MLTSA:

point of change

problem statement

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/#R52

Speech recognition! Hey Siri!

Speech recognition represents the process of converting spoken speech utterances to words or text. Change point detection methods are applied here for audio segmentation and recognizing boundaries between silence, sentences, words, and noise [13][14].

Chowdhury MFR, Selouani SA, O’Shaughnessy D. Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR. Int J Speech Technol. 2011 Oct;15(1):5–23. [Google Scholar]

Rybach D, Gollan C, Schluter R, Ney H. Audio segmentation for speech recognition using segment features. IEEE International Conference on Acoustics, Speech and Signal Processing. 2009:4197–4200. [Google Scholar]

MLTSA:

point of change

problem statement

detecting a change in the generative process:

the process is piecewise stationary

change in mean -> e.g. physics state transition

change in variance -> e.g. earthquake

MLTSA:

point of change

problem statement

detecting a change in the generative process:

the process is piecewise stationary

change in mean -> e.g. physics state transition

change in variance -> e.g. earthquake

MLTSA:

online vs offline

algorithms

online = in real time

when performed online POC analysis is equivalent to anomaly detection

offline = considering the whole time series

when performed offline POC analysis is equivalent to segmentation

DEFINITION:

MLTSA:

Point of change:

Bayesian approach

3.1

Suppose you know there is only one point of change, and that the change is only in the mean:

\mathrm{poc}: max_{t = \mathrm{poc}} ~\left| E[y(0...t)] / E[y(t+1...N)] \right|

simplest approach

MLTSA:

single point of change

\mathrm{poc}: min_{t = \mathrm{poc}} \sum_{0...\mathrm{poc}} (y_i - E[y_{0...\mathrm{poc}}])^2 + \sum_{\mathrm{poc}+ 1 ... N} (y_i - E[y_{\mathrm{poc}+1...N}])^2

Frequentist approach

MLTSA:

single point of change

MLTSA:

single point of change

L = \begin{cases} \frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{(y_i - \mu_1)^2}{2\sigma^2}\right) & i <= \mathrm{poc}\\ \frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{(y_i - \mu_2)^2}{2\sigma^2}\right) & i > \mathrm{poc} \end{cases}

Bayesian approach

there is an unknown partition poc such that the distributions are equal within blocks

(assume iid, ~N... all nice things)

MLTSA:

single point of change

P(D | \{\mu_1, \mu_2, \sigma, \mathrm{poc} \}, I ) = \left(\frac{1}{\sqrt{2 \pi} \sigma}\right)^\frac{N}{2} exp \left( -\frac{1}{2\sigma^2} \left( \sum_{i=1}^\mathrm{poc} (d_i - \mu_1)^2 + \sum_{i=\mathrm{poc}}^N (d_i - \mu_2)^2 \right) \right)

A Survey of Methods for Time Series Change Point Detection

Samaneh Aminikhanghahi and Diane J. Cook

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/#R52

Bayesian approach

P(\mu_1,\mu_2,\sigma,\mathrm{poc}|D)=\frac{P(D|\mu_1,\mu_2,\sigma,\mathrm{poc})P(\mu_1,\mu_2,\sigma,\mathrm{poc})}{P(D)}

MLTSA:

single point of change

P(\mathrm{poc}|D) \propto \int_0^\infty d\mu_1 \int_0^\infty d\mu_2 \int_0^\infty d\sigma P(D | \mu_1, \mu_2, \sigma) P(\mu_1, \mu_2, \sigma)

P(\mu_1,\mu_2,\sigma,\mathrm{poc}|D)=\frac{P(D|\mu_1,\mu_2,\sigma,\mathrm{poc})P(\mu_1,\mu_2,\sigma,\mathrm{poc})}{P(D)}

A Survey of Methods for Time Series Change Point Detection

Samaneh Aminikhanghahi and Diane J. Cook

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/#R52

Bayesian approach

Really we want P(poc|D)

A Survey of Methods for Time Series Change Point Detection

Samaneh Aminikhanghahi and Diane J. Cook

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/#R52

Bayesian approach

MLTSA:

single point of change

P(\mathrm{poc}|D) \propto \int_0^\infty d\mu_1 \int_0^\infty d\mu_2 \int_0^\infty d\sigma \frac{P(D | \mu_1, \mu_2, \sigma)} {\sigma}

Really we want P(poc|D)

P(\mu_1,\mu_2,\sigma,\mathrm{poc}|D)=\frac{P(D|\mu_1,\mu_2,\sigma,\mathrm{poc})P(\mu_1,\mu_2,\sigma,\mathrm{poc})}{P(D)}

Priors:

P(\mu_1) \sim N(E(y_{0...poc}), \sigma)\\ P(\mu_2) \sim N(E(y_{0...poc}), \sigma)\\ P(\mathrm{poc}) ; 0 < \mathrm{poc} < t_\mathrm{max}\\ P(w) ; 0< w < m_\mathrm{max}\\ \mathrm{where}~ w = \frac{\sigma^2}{ \sigma^2 + \sigma_0^2}

A Survey of Methods for Time Series Change Point Detection

Samaneh Aminikhanghahi and Diane J. Cook

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/#R52

Bayesian approach

MLTSA:

Point of change:

n-points generalizatoin

3.2

the problem reduces to:

choosing the best possible segmentation T

such that a target function V (T , y) is minimized

V (T , y) := \sum_{k=0}^N c(y_{t_k} .. _{t_{k+1}})

MLTSA:

practical approach

https://arxiv.org/pdf/1801.00718.pdf

(Ignores Bayesian approaches)

Selective review of offline change point detection methods

Charles Truonga, Laurent Oudre, Nicolas Vayatis

Formally, change point detection is cast as a model selection problem, which consists in choosing the best possible segmentation T according to a quantitative criterion V (T , y) that must be minimized. The choice of the criterion function V (·) depends on preliminary knowledge on the task at hand.

V (T , y) := \sum_{k=0}^N c(y_{t_k} .. _{t_{k+1}})

I am looking for N poc

min_{|T| = N} V(T)

min_T V(T) + \mathrm{complexity}(T)

I do not know how many poc

(Ignores Bayesian approaches)

MLTSA:

practical approach

https://arxiv.org/pdf/1801.00718.pdf

Selective review of offline change point detection methods

Charles Truonga, Laurent Oudre, Nicolas Vayatis

(Ignores Bayesian approaches)

MLTSA:

practical approach

https://arxiv.org/pdf/1801.00718.pdf

Selective review of offline change point detection methods

Charles Truonga, Laurent Oudre, Nicolas Vayatis

(Ignores Bayesian approaches)

MLTSA:

practical approach

https://arxiv.org/pdf/1801.00718.pdf

Selective review of offline change point detection methods

Charles Truonga, Laurent Oudre, Nicolas Vayatis

ML model performance

LR = _____________________________

True Negative

False Negative

	H0 is True	H0 is False
H0 is falsified	Type I Error False Positive	True Positive
H0 is not falsified	True Negative	Type II Error False Negative

Accuracy, Recall, Precision

	H0 is True	H0 is False
H0 is falsified	Type I Error False Positive	True Positive
H0 is not falsified	True Negative	Type II Error False Negative

Find a POC where there is none

miss a POC

ML model performance

Accuracy, Recall, Precision

https://en.wikipedia.org/wiki/Precision_and_recall

Precision

Recall

Accuracy

= \frac{TP}{TP~+~FP}

= \frac{TP}{TP~+~FN}

= \frac{TP~+~TN}{TP~+~TN~+~FP~+~FN}

TP=True Positive

FP=False Positive

TN=True Negative

FN=False Positive

Receiver operating characteristic

In a probabilistic context, as you change the probability threshold for selection of POC you will get a different FP TP ratio.

Receiver operating characteristic

In a probabilistic context, as you change the probability threshold for selection of POC you will get a different FP TP ratio.


##single point change detector
# as in https://www.slideshare.net/FrankKelly3/changepoint-detection-with-bayesian-inference
# with modifications for efficiency

def changeFinder(data):

    n = len(data)
    datamean = data.mean()
    datasqmean = (data**2).mean()
    fac = datasqmean - datamean**2
    
    datacsum = data.cumsum()
    datasum = datacsum[-1]

    ppoc = np.zeros(n) #container for point of change relative prob

    #online (iterative) search for point of change
    for m in range(n-1):
        pos = m + 1
        relativePosition = (pos) * (n - pos)
 
        Q = datacsum[m] - (datasum - datacsum[m]) #cumsum up to m - cumsum after
        
        U = -(datamean * (n - 2 * pos) + Q)**2 / (4.0 * relativePosition) + fac
       
        ppoc[m+1] = (-(n * 0.5 - 1) * np.log(n * U * 0.5) - 
                 0.5 * np.log(relativePosition))

    ppoc[0] = min(ppoc[1:])
    changePoint = np.argmax(ppoc)
    
    return {'pChange': ppoc, 
            'pointOfChange': changePoint + 1,
            'meanBefore': (data[:changePoint+1]).mean(), 
            'meanAfter': (data[(changePoint+1):]).mean()}

high FP rate

high TP rate

Receiver operating characteristic

In a probabilistic context, as you change the probability threshold for selection of POC you will get a different FP TP ratio.


##single point change detector
# as in https://www.slideshare.net/FrankKelly3/changepoint-detection-with-bayesian-inference
# with modifications for efficiency

def changeFinder(data):

    n = len(data)
    datamean = data.mean()
    datasqmean = (data**2).mean()
    fac = datasqmean - datamean**2
    
    datacsum = data.cumsum()
    datasum = datacsum[-1]

    ppoc = np.zeros(n) #container for point of change relative prob

    #online (iterative) search for point of change
    for m in range(n-1):
        pos = m + 1
        relativePosition = (pos) * (n - pos)
 
        Q = datacsum[m] - (datasum - datacsum[m]) #cumsum up to m - cumsum after
        
        U = -(datamean * (n - 2 * pos) + Q)**2 / (4.0 * relativePosition) + fac
       
        ppoc[m+1] = (-(n * 0.5 - 1) * np.log(n * U * 0.5) - 
                 0.5 * np.log(relativePosition))

    ppoc[0] = min(ppoc[1:])
    changePoint = np.argmax(ppoc)
    
    return {'pChange': ppoc, 
            'pointOfChange': changePoint + 1,
            'meanBefore': (data[:changePoint+1]).mean(), 
            'meanAfter': (data[(changePoint+1):]).mean()}

low FP rate

low TP rate

Receiver operating characteristic

In a probabilistic context, as you change the probability threshold for selection of POC you will get a different FP TP ratio.


##single point change detector
# as in https://www.slideshare.net/FrankKelly3/changepoint-detection-with-bayesian-inference
# with modifications for efficiency

def changeFinder(data):

    n = len(data)
    datamean = data.mean()
    datasqmean = (data**2).mean()
    fac = datasqmean - datamean**2
    
    datacsum = data.cumsum()
    datasum = datacsum[-1]

    ppoc = np.zeros(n) #container for point of change relative prob

    #online (iterative) search for point of change
    for m in range(n-1):
        pos = m + 1
        relativePosition = (pos) * (n - pos)
 
        Q = datacsum[m] - (datasum - datacsum[m]) #cumsum up to m - cumsum after
        
        U = -(datamean * (n - 2 * pos) + Q)**2 / (4.0 * relativePosition) + fac
       
        ppoc[m+1] = (-(n * 0.5 - 1) * np.log(n * U * 0.5) - 
                 0.5 * np.log(relativePosition))

    ppoc[0] = min(ppoc[1:])
    changePoint = np.argmax(ppoc)
    
    return {'pChange': ppoc, 
            'pointOfChange': changePoint + 1,
            'meanBefore': (data[:changePoint+1]).mean(), 
            'meanAfter': (data[(changePoint+1):]).mean()}

Receiver operating characteristic

In a probabilistic context, as you change the probability threshold for selection of POC you will get a different FP TP ratio.


##single point change detector
# as in https://www.slideshare.net/FrankKelly3/changepoint-detection-with-bayesian-inference
# with modifications for efficiency

def changeFinder(data):

    n = len(data)
    datamean = data.mean()
    datasqmean = (data**2).mean()
    fac = datasqmean - datamean**2
    
    datacsum = data.cumsum()
    datasum = datacsum[-1]

    ppoc = np.zeros(n) #container for point of change relative prob

    #online (iterative) search for point of change
    for m in range(n-1):
        pos = m + 1
        relativePosition = (pos) * (n - pos)
 
        Q = datacsum[m] - (datasum - datacsum[m]) #cumsum up to m - cumsum after
        
        U = -(datamean * (n - 2 * pos) + Q)**2 / (4.0 * relativePosition) + fac
       
        ppoc[m+1] = (-(n * 0.5 - 1) * np.log(n * U * 0.5) - 
                 0.5 * np.log(relativePosition))

    ppoc[0] = min(ppoc[1:])
    changePoint = np.argmax(ppoc)
    
    return {'pChange': ppoc, 
            'pointOfChange': changePoint + 1,
            'meanBefore': (data[:changePoint+1]).mean(), 
            'meanAfter': (data[(changePoint+1):]).mean()}

MLTSA:

Visualization of the week

This plot style is a "stream graph".

They are good for multivariate data as they enable the comparison of several (but not too many) time series.

Overall changes in volume (and therefore the variance) is going to be more obvious than trends.

https://python-graph-gallery.com/streamchart/

This plot style is a "stream graph".

This particular image is a screenshot from an interactive tool

http://advanse.lirmm.fr/hierarchical/visualize.php

When you design an interactive tool you want to keep two principles in mind

1. The tool should show context/overview first, details should be available on demand (in this tool you can select a portion of the time series and expand it)

2. Access to details should be immediate: people are really patient... for 3 seconds. Then they loose interest and attention

references

The Unofficial Google Data Science Blog

http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html

Matlab Kalman Filter tutorial videos (super clear!) https://www.youtube.com/watch?v=mwn8xhgNpFY

references

A Survey of Methods for Time Series Change Point Detection
Samaneh Aminikhanghahi and Diane J. Cook

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/

Selective review of offline change point detection methods

Charles Truonga, Laurent Oudre, Nicolas Vayatis

https://arxiv.org/pdf/1801.00718.pdf