The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not been well discussed. To this end, this paper studies the neighbor in the existing outlier detection algorithms and a taxonomy is introduced, which uses the three-level components of information, neighbor and methodology to define hybrid methods. This taxonomy can serve as a paradigm where a novel neighbor-based outlier detection method can be proposed by combining different components in this taxonomy. A large number of comparative experiments were conducted on synthetic and real-world datasets in terms of performance comparison and case study, and the results show that reverse K-nearest neighbor based methods achieve promising performance and dynamic selection method is suitable for working in high-dimensional space. Notably, it is verified that rationally selecting components from this taxonomy may create an algorithms superior to existing methods.
翻译:基于邻域的方法已成为处理离群点检测问题的有力工具,其核心在于通过样本及其邻域的紧密度来推断样本的异常程度。然而,现有方法通常侧重于设计不同的流程来定位数据集中的离群点,而各类邻域对离群检测的贡献尚未得到充分探讨。为此,本文系统研究了现有离群检测算法中的邻域机制,并引入一种分类体系,该体系通过信息、邻域和方法论三个层级组件来定义混合方法。此分类体系可作为一种范式,通过组合其中的不同组件,能够提出新颖的基于邻域的离群检测方法。我们在合成数据集与真实数据集上进行了大量对比实验,涵盖性能比较与案例研究两个方面。实验结果表明:基于反向K近邻的方法展现出优越性能,动态选择方法则适用于高维空间工作。值得注意的是,研究验证了从该分类体系中合理选择组件可能构建出优于现有方法的算法。