Anomaly Detection for Time Series Data
Anomalies
DO NOT CONFORM TO THE EXPECTED PATTERN
Event
Observation
Noise
Where?
Health System Monitoring
Fraud Detection
Equipment Component Failure
Stock market anomalies
Digital Marketing
ML for AD
System evolves > Context of anomalies changes
ML for AD
Rulebased systems need automation
ML for AD
 Supervised:
 Historical data provides intelligence
 Historical data provides intelligence
 Unsupervised
 E.g. Clustering
Only for DataDriven Products?
Only for DataDriven Products?
 Fault detection
 Sensor networks
 Intrusion Detection
 Data cleaning
Time Series Analyses
 Sequential
 Trend
 Seasonality
 Discretized
 Autocorrelation
Time Series Analyses
Multivariate vs Univariate
Classification vs Regression
Common Algorithms in Use
 Rulebased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
 Prone to human error
 VERY manual
 Sequential behaviour not taken into account
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Do not look for patterns
Sequential characteristic unobserved
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Does not detect slow changes
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
 Compare distances among points
 kNN Algo
Hey there neighbour!
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
 Do not capture temporal behaviour
 Employ feature extraction
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
 Decision Trees
 Recurrent Neural Neworks
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Recurrent Neural Networks
Networks with loops in them, allowing information to persist.
Common Algorithms in Use

Thresholdbased:
 Simple
 Statistical Aggregates
 Time Series Forecasting
 Distancebased
 Supervised Learning Models
Long shortterm memory units
Remember information for long periods of time
Python...?
SciPy Stack
Scikitlearn
Keras
TensorFlow
Pybrain
Case Study:
Leak in a system
Case Study:
(Go to IPython Notebook)
Further Applications
Speech Recognition
Handwriting Recognition
References/
Useful Links

https://iamtrask.github.io/2015/11/15/anyonecancodelstm/

http://colah.github.io/posts/201508UnderstandingLSTMs/

https://www.coursera.org/learn/machinelearning/lecture/Rkc5x/anomalydetectionvssupervisedlearning

http://info.prelert.com/blog

http://machinelearningmastery.com/timeseriespredictionlstmrecurrentneuralnetworkspythonkeras/

Kadous, Mohammed Waleed, and Claude Sammut. "Classification of multivariate time series and structured data using constructive induction." Machine learning 58.2 (2005): 179216.

Xing, Zhengzheng, Jian Pei, and Eamonn Keogh. "A brief survey on sequence classification." ACM SIGKDD Explorations Newsletter 12.1 (2010): 4048.
Anomaly Detection
By Shreya Khurana