4 R's of DB Research

Reading, Rithmetic, Research and wRiting

{Outlier}Alpha – Reading Plan

December9

My master’s thesis work is anomaly detection in large dynamic graph. Thus, I plan to read some selected papers in this classic topic.

Outliers or Anomalies are patterns in data that do not conform a well-defined notion of normal behavior.

[Chandola et al. 2009] provides a brief overview in this area and a taxonomy on existing techniques.
The reading plan is to read classical or newest papers of this topic in DB community accoring to the taxonomy.

Techniques include:

* Classification
Two-class classify problem – Using classical model such as Neural Network, SVM and Bayesian Network.
Challenge: feature selection + train set data.

* NN approach
Outlier is objects whose near neighours are sparse.
Selected Readings:
- Yufei Tao, Xiaokui Xiao, and Shuigeng Zhou. Mining Distance-based Outliers from Large Databases in Any Metric Space. Proceedings of the 12th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD), pages 394-403, 2006.

* Cluster
Outlier is objects not located in any cluster.
* Statistical
* Information Theoretic
* Spectral

NN approcah and cluster approach are the commonest techniques DB community tends to work on. I will focus on it especially.

References:
Varun Chandola, Arindam Banerjee, and Vipin Kumar, “Anomaly Detection : A Survey“, ACM Computing Surveys, Vol. 41(3), Article 15, July 2009. [Slides]