Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper focuses on the inductive bias of iForest, which theoretically elucidates under what circumstances and to what extent iForest works well. The key is to formulate the growth process of iForest, where the split dimensions and split values are randomly selected. We model the growth process of iForest as a random walk, enabling us to derive the expected depth function, which is the outcome of iForest, using transition probabilities. The case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor. Our study provides a theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.
翻译:隔离森林(iForest)作为一种广泛使用的无监督异常检测器,主要因其卓越的运行效率和在大规模任务中的优异性能而备受关注。尽管其应用广泛,但解释iForest成功的理论基础仍不明确。本文聚焦于iForest的归纳偏置,从理论上阐明iForest在何种情况下以及何种程度上表现良好。关键在于形式化iForest的生长过程,其中分割维度和分割值均为随机选择。我们将iForest的生长过程建模为随机游走,从而能够利用转移概率推导出iForest的输出结果——期望深度函数。案例研究揭示了关键的归纳偏置:与$k$-近邻算法相比,iForest对中心异常点的敏感性较低,同时展现出更强的参数适应性。本研究为iForest的有效性提供了理论理解,并为进一步的理论探索奠定了基础。