Anomaly Detection involves identifying unusual behaviors within complex datasets and systems. While Machine Learning algorithms and Decision Support Systems (DSSs) offer effective solutions for this task, simply pinpointing anomalies may prove insufficient in real-world applications. Users require insights into the rationale behind these predictions to facilitate root cause analysis and foster trust in the model. However, the unsupervised nature of AD presents a challenge in developing interpretable tools. This paper addresses this challenge by introducing ExIFFI, a novel interpretability approach specifically designed to explain the predictions made by Extended Isolation Forest. ExIFFI leverages feature importance to provide explanations at both global and local levels. This work also introduces EIF+, an enhanced variant of Extended Isolation Forest, conceived to improve its generalization capabilities through a different splitting hyperplanes design strategy. A comprehensive comparative analysis is conducted, employing both synthetic and real-world datasets to evaluate various unsupervised AD approaches. The analysis demonstrates the effectiveness of ExIFFI in providing explanations for AD predictions. Furthermore, the paper explores the utility of ExIFFI as a feature selection technique in unsupervised settings. Finally, this work contributes to the research community by providing open-source code, facilitating further investigation and reproducibility.
翻译:异常检测涉及识别复杂数据集和系统中的异常行为。尽管机器学习算法与决策支持系统为此任务提供了有效解决方案,但在实际应用中仅定位异常点可能不足。用户需要理解这些预测背后的逻辑,以促进根本原因分析并增强对模型的信任。然而,异常检测的无监督特性给开发可解释工具带来了挑战。本文通过引入ExIFFI——一种专为解释扩展隔离森林预测而设计的新型可解释性方法——应对这一挑战。ExIFFI利用特征重要性,在全局与局部层面提供解释。本文还提出了EIF+,这是扩展隔离森林的增强变体,通过不同的分割超平面设计策略提升其泛化能力。研究采用合成数据集与真实数据集进行全面的比较分析,评估多种无监督异常检测方法。分析证明了ExIFFI在提供异常检测预测解释方面的有效性。此外,本文探讨了ExIFFI作为无监督场景下特征选择技术的实用性。最后,本研究通过提供开源代码为学术界做出贡献,便于后续研究与可重复性验证。