By now, most outlier-detection algorithms struggle to accurately detect both point anomalies and cluster anomalies simultaneously. Furthermore, a few K-nearest-neighbor-based anomaly-detection methods exhibit excellent performance on many datasets, but their sensitivity to the value of K is a critical issue that needs to be addressed. To address these challenges, we propose a novel robust anomaly detection method, called Entropy Density Ratio Outlier Detection (EDROD). This method incorporates the probability density of each sample as the global feature, and the local entropy around each sample as the local feature, to obtain a comprehensive indicator of abnormality for each sample, which is called Entropy Density Ratio (EDR) for short in this paper. By comparing several competing anomaly detection methods on both synthetic and real-world datasets, it is found that the EDROD method can detect both point anomalies and cluster anomalies simultaneously with accurate performance. In addition, it is also found that the EDROD method exhibits strong robustness to the number of selected neighboring samples, the dimension of samples in the dataset, and the size of the dataset. Therefore, the proposed EDROD method can be applied to a variety of real-world datasets to detect anomalies with accurate and robust performances.
翻译:目前,大多数离群点检测算法难以同时准确检测点异常和簇异常。此外,少数基于K近邻的异常检测方法在多个数据集上表现优异,但其对K值的敏感性是一个亟待解决的关键问题。为应对这些挑战,我们提出了一种新颖的鲁棒异常检测方法——熵密度比离群点检测(EDROD)。该方法将每个样本的概率密度作为全局特征,并将每个样本周围的局部熵作为局部特征,从而获得每个样本异常程度的综合指标,本文简称为熵密度比(EDR)。通过在合成数据集和真实数据集上比较多种竞争性异常检测方法,发现EDROD方法能够同时准确检测点异常和簇异常。此外,研究还表明,EDROD方法对所选邻近样本数量、数据集中样本维度及数据集规模具有强鲁棒性。因此,所提出的EDROD方法可应用于多种真实数据集,以准确且鲁棒地进行异常检测。