{Outlier}Alpha – Reading Plan
My master’s thesis work is anomaly detection in large dynamic graph. Thus, I plan to read some selected papers in this classic topic.
Outliers or Anomalies are patterns in data that do not conform a well-defined notion of normal behavior.
[Chandola et al. 2009] provides a brief overview in this area and a taxonomy on existing techniques.
The reading plan is to read classical or newest papers of this topic in DB community accoring to the taxonomy.
Techniques include:
* Classification
Two-class classify problem – Using classical model such as Neural Network, SVM and Bayesian Network.
Challenge: feature selection + train set data.
* NN approach
Outlier is objects whose near neighours are sparse.
Selected Readings:
- Yufei Tao, Xiaokui Xiao, and Shuigeng Zhou. Mining Distance-based Outliers from Large Databases in Any Metric Space. Proceedings of the 12th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD), pages 394-403, 2006.
* Cluster
Outlier is objects not located in any cluster.
* Statistical
* Information Theoretic
* Spectral
NN approcah and cluster approach are the commonest techniques DB community tends to work on. I will focus on it especially.
References:
Varun Chandola, Arindam Banerjee, and Vipin Kumar, “Anomaly Detection : A Survey“, ACM Computing Surveys, Vol. 41(3), Article 15, July 2009. [Slides]

