4 R's of DB Research

Reading, Rithmetic, Research and wRiting

{Outlier}Alpha – Reading Plan

December9

My master’s thesis work is anomaly detection in large dynamic graph. Thus, I plan to read some selected papers in this classic topic.

Outliers or Anomalies are patterns in data that do not conform a well-defined notion of normal behavior.

[Chandola et al. 2009] provides a brief overview in this area and a taxonomy on existing techniques.
The reading plan is to read classical or newest papers of this topic in DB community accoring to the taxonomy.

Techniques include:

* Classification
Two-class classify problem – Using classical model such as Neural Network, SVM and Bayesian Network.
Challenge: feature selection + train set data.

* NN approach
Outlier is objects whose near neighours are sparse.
Selected Readings:
- Yufei Tao, Xiaokui Xiao, and Shuigeng Zhou. Mining Distance-based Outliers from Large Databases in Any Metric Space. Proceedings of the 12th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD), pages 394-403, 2006.

* Cluster
Outlier is objects not located in any cluster.
* Statistical
* Information Theoretic
* Spectral

NN approcah and cluster approach are the commonest techniques DB community tends to work on. I will focus on it especially.

References:
Varun Chandola, Arindam Banerjee, and Vipin Kumar, “Anomaly Detection : A Survey“, ACM Computing Surveys, Vol. 41(3), Article 15, July 2009. [Slides]

posted under Data Mining, Outlier

Email will not be published

Website example

Your Comment:

It sounds like SK2 has recently been updated on this blog. But not fully configured. You MUST visit Spam Karma's admin page at least once before letting it filter your comments (chaos may ensue otherwise).