Anomaly detection, an essential unsupervised machine learning task, involves identifying unusual behaviors within complex datasets and systems. While Machine Learning algorithms and decision support systems (DSSs) offer effective solutions for this task, simply pinpointing anomalies often falls short in real-world applications. Users of these systems often require insight into the underlying reasons behind predictions to facilitate Root Cause Analysis and foster trust in the model. However, due to the unsupervised nature of anomaly detection, creating interpretable tools is challenging. This work introduces EIF+, an enhanced variant of Extended Isolation Forest (EIF), designed to enhance generalization capabilities. Additionally, we present ExIFFI, a novel approach that equips Extended Isolation Forest with interpretability features, specifically feature rankings. Experimental results provide a comprehensive comparative analysis of Isolation-based approaches for Anomaly Detection, including synthetic and real dataset evaluations that demonstrate ExIFFI's effectiveness in providing explanations. We also illustrate how ExIFFI serves as a valid feature selection technique in unsupervised settings. To facilitate further research and reproducibility, we also provide open-source code to replicate the results.
翻译:异常检测作为一项重要的无监督机器学习任务,涉及识别复杂数据集与系统中的异常行为。尽管机器学习算法和决策支持系统(DSSs)为此任务提供了有效解决方案,但在实际应用中仅定位异常往往不足以满足需求。这些系统的用户通常需要理解预测背后的根本原因,以促进根因分析并增强对模型的信任。然而,由于异常检测的无监督特性,构建可解释性工具极具挑战性。本文提出了EIF+,即扩展孤立森林(EIF)的增强变体,旨在提升泛化能力。此外,我们首创了ExIFFI方法,为扩展孤立森林赋予可解释性特征(具体为特征排序)。实验结果表明,基于合成数据集与真实数据集的全面比较分析显示,ExIFFI在提供解释方面具有有效性,同时展示了其作为无监督环境下特征选择技术的有效性。为促进后续研究及结果复现,我们提供了开源代码。