Anomaly Detection for Time Series Data
Anomalies
DO NOT CONFORM TO THE EXPECTED PATTERN
Event
Observation
Noise
Where?
Health System Monitoring
Fraud Detection
Equipment Component Failure
Stock market anomalies
Digital Marketing
ML for AD
System evolves -----> Context of anomalies changes
ML for AD
Rule-based systems need automation
ML for AD
- Supervised:
- Historical data provides intelligence
- Historical data provides intelligence
- Unsupervised
- E.g. Clustering
Only for Data-Driven Products?
Only for Data-Driven Products?
- Fault detection
- Sensor networks
- Intrusion Detection
- Data cleaning
Time Series Analyses
- Sequential
- Trend
- Seasonality
- Discretized
- Autocorrelation
Time Series Analyses
Multivariate vs Univariate
Classification vs Regression
Common Algorithms in Use
- Rule-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
- Prone to human error
- VERY manual
- Sequential behaviour not taken into account
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Do not look for patterns
Sequential characteristic unobserved
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Does not detect slow changes
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
- Compare distances among points
- k-NN Algo
Hey there neighbour!
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
- Do not capture temporal behaviour
- Employ feature extraction
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
- Decision Trees
- Recurrent Neural Neworks
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Recurrent Neural Networks
Networks with loops in them, allowing information to persist.
Common Algorithms in Use
-
Threshold-based:
- Simple
- Statistical Aggregates
- Time Series Forecasting
- Distance-based
- Supervised Learning Models
Long short-term memory units
Remember information for long periods of time
Python...?
SciPy Stack
Scikit-learn
Keras
TensorFlow
Pybrain
Case Study:
Leak in a system
Case Study:
(Go to IPython Notebook)
Further Applications
Speech Recognition
Handwriting Recognition
References/
Useful Links
-
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/
-
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
-
https://www.coursera.org/learn/machine-learning/lecture/Rkc5x/anomaly-detection-vs-supervised-learning
-
http://info.prelert.com/blog
-
http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
-
Kadous, Mohammed Waleed, and Claude Sammut. "Classification of multivariate time series and structured data using constructive induction." Machine learning 58.2 (2005): 179-216.
-
Xing, Zhengzheng, Jian Pei, and Eamonn Keogh. "A brief survey on sequence classification." ACM SIGKDD Explorations Newsletter 12.1 (2010): 40-48.
Anomaly Detection
By Shreya Khurana
Anomaly Detection
- 1,465