Detecting Anomalous Groups
Special Topics in Machine Learning
Rishav Chakravarti
Agenda
- Background & Motivation
- Three Categories of Anomaly Detection
- Point-based
- Aggregation-based
-
Distribution-based
Individual Anomalies
"Identify individual data points that are rare due to particular combinations
of features"
-- Wong et. al 2002
Anomalous Groups
The idea that interesting patterns only emerge due to groupings.Point-based
Find individual points which are already anomalous, then group them.
-- EPD Lecture 2
Point-based
Pros
Leverages tried & tested methods for individual anomaly detection as well as clustering techniques.
Cons
This only works when the 'anomaly' presents itself at the individual.
Group detection can be sensitive to detection algorithm parameters.
Ultimately continues to be useful in many domains.
Aggregation-based
Aggregates 'counts' of events into groups and flag groups where the aggregated count is anomalous.
Rule Based Anomaly Pattern Detection
Domain is detection of emerging disease outbreaks
Records emergency department cases (events)
Groups based on one & two component rule learning
(gender=male) and (age_decile=9)
Scores each group by comparing against historical counts
Significance tests based on randomization
Results: Reduces time to detection with minimal rise in FPR
(caveat for small p-values)
Distribution-based
Required to find anomalous groups where the points are relatively normal, but as a whole they are
unusual. And no obvious aggregation.



Distribution-based
Option 1
- Define a set of features that make up a group
- Learn distribution of these features
- Use traditional anomaly detection techniques
E.g. customer segment analysis
What are the pros/cons of such an approach?
Distribution-based
Option 2
Anomaly Detection for Astronomical Data (1)
Domain is detection of 'interesting' galaxy clusters
Records
spectral data from sky observations (500 dimensions)
Define distributions:
- Each observation, x, grouped into M galaxy clusters.
- Each observation, x, also assigned to one galaxy types.
- There is an overall distribution, M(Θ), over galaxy clusters.
- For each galaxy cluster, there is an expected distribution, Dir(Χ), over galaxy types.
- Each galaxy type has an expected distribution, N(β), over the spectral observations.
Dirichlet Genre Model
Learn using variation of Expectation Maximization.
Use
to calculate anomalousness
Anomaly Detection for Astronomical Data (3)
Results: Performed better than 'single point' anomaly detection schemes for simulated/labelled data. Corroborated 'interesting' findings with experts.
What are some of the pros/cons of this approach?
Questions?
References
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003
Philip K. Chan and Matthew V. Mahoney. Modeling multiple time series for anomaly detection. In IEEE International Conference on Data Mining, 2005
Kaustav Das, Jeff Schneider, and Daniel Neill. Anomaly pattern detection in categorical datasets. In Knowledge Discovery and Data Mining (KDD), 2008
Rupali Kandhari, Shilpa Dhange, Archana Bansod, and Dr. P.K. Deshmukh. Anamoly Detection. International Conference on Computer Science & Engineering (ICCSE), 17th March-2013
Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. Rule-based anomaly pattern detection for detecting disease outbreaks. From American Association for Artificial Intelligence-02 Proceedings, 2002
Liang Xiong, Barnabas Poczos, Andrew Connolly, and Jeff Schneider.Anomaly Detection for Astronomical Data. Data Analysis Project, Machine Learning Department, Carnegie Mellon University, 2011
Liang Xiong, Barnabas Poczos, and Jeff Schneider. Group anomaly detection using Flexible Genre Models. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
Liang Xiong, Barnabas Poczos, and Jeff Schneider. Hierarchical probabilistic models for group anomaly detection. In International conference on Artificial Intelligence and Statistics (AISTATS), 2011
Images
http://www.anorak.co.uk/wp-content/uploads/2013/02/sheep-france-wolf1.jpg
http://www.hopesteadhillfarm.com/photos/sheep-on-hill.jpg
http://www.susanstevenson.com/Journal/2010/August/1749GrayWolfP.jpg
http://upload.wikimedia.org/wikipedia/en/f/f1/Down_Arrow_Icon.png
deck
By myman
deck
- 295