Anomaly Detection for Time Series Data

Anomalies

DO NOT CONFORM TO THE EXPECTED PATTERN

Event

Observation

Noise

Where?

Health System Monitoring

Fraud Detection

Equipment Component Failure

Stock market anomalies

Digital Marketing

ML for AD

System evolves -----> Context of anomalies changes

ML for AD

Rule-based systems need automation

ML for AD

  • Supervised:
    • Historical data provides intelligence
       
  • Unsupervised
    • E.g. Clustering

Only for Data-Driven Products?

Only for Data-Driven Products?

  1. Fault detection
     
  2. Sensor networks
     
  3. Intrusion Detection
     
  4. Data cleaning

Time Series Analyses

  1. Sequential
     
  2. Trend
     
  3. Seasonality
     
  4. Discretized
     
  5. Autocorrelation

Time Series Analyses

Multivariate vs Univariate

 

Classification vs Regression

Common Algorithms in Use

  1. Rule-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

  • Prone to human error
  • VERY manual
  • Sequential behaviour not taken into account

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Do not look for patterns

Sequential characteristic unobserved

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Does not detect slow changes

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

  • Compare distances among points
  • k-NN Algo

 

 

Hey there neighbour!

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

  • Do not capture temporal behaviour
  • Employ feature extraction

 

 

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

  • Decision Trees
  • Recurrent Neural Neworks

 

 

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Recurrent Neural Networks 

 

 

Networks with loops in them, allowing information to persist.

Common Algorithms in Use

  1. Threshold-based:
    1. Simple
    2. Statistical Aggregates
  2. Time Series Forecasting
  3. Distance-based
  4. Supervised Learning Models

 

Long short-term memory units

 

Remember information for long periods of time

Python...?

SciPy Stack

Scikit-learn

Keras

TensorFlow

Pybrain

Case Study:

Leak in a system

Case Study:

(Go to IPython Notebook)

Further Applications

Speech Recognition

Handwriting Recognition

References/

Useful Links

  1. https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/
  2. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  3. https://www.coursera.org/learn/machine-learning/lecture/Rkc5x/anomaly-detection-vs-supervised-learning
  4. http://info.prelert.com/blog
  5. http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
  6. Kadous, Mohammed Waleed, and Claude Sammut. "Classification of multivariate time series and structured data using constructive induction." Machine learning 58.2 (2005): 179-216.
  7. Xing, Zhengzheng, Jian Pei, and Eamonn Keogh. "A brief survey on sequence classification." ACM SIGKDD Explorations Newsletter 12.1 (2010): 40-48.

 

 

Anomaly Detection

By Shreya Khurana

Anomaly Detection

  • 1,465