Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study

Detection of anomalous situations for complex mission-critical systems holds paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of machine learning-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical machine learning including various tree-based approaches to deep learning and outlier detection methods. The inclusion of 104 publicly available and a few proprietary industrial systems datasets enhances the diversity of the study, allowing for a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper dispels the deep learning myth, demonstrating that though powerful, deep learning is not a universal solution in this case. We observed that recently proposed tree-based evolutionary algorithms outperform in many scenarios. We noticed that tree-based approaches catch a singleton anomaly in a dataset where deep learning methods fail. On the other hand, classical SVM performs the best on datasets with more than 10% anomalies, implying that such scenarios can be best modeled as a classification problem rather than anomaly detection. To our knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier.

翻译：对于需要确保服务连续性的复杂关键任务系统而言，检测异常情况至关重要。从运行数据中检测异常的主要挑战源于类别分布不平衡问题，因为异常事件通常被视为罕见事件。本文通过一项全面的基准测试研究，评估了多种基于机器学习的异常检测算法。本文的重要贡献在于对不同异常检测算法进行了无偏比较，涵盖了包括各种基于树的方法在内的传统机器学习、深度学习及离群点检测方法。研究中纳入了104个公开数据集及少量工业系统专有数据集，增强了研究的多样性，使得算法性能评估更贴近实际场景，并强调了算法对现实世界适应性的重要性。本文打破了深度学习神话，证明尽管深度学习强大，但它并非在此类问题中的通用解决方案。我们观察到，近期提出的基于树的进化算法在多种场景下表现更优。研究还发现，在深度学习方法失效的数据集中，基于树的方法能够捕捉到单一异常点。另一方面，传统支持向量机在异常占比超过10%的数据集上表现最佳，这表明此类场景更适合建模为分类问题而非异常检测问题。据我们所知，此前尚未有研究采用如此大量的前沿算法与多样化数据集，以指导研究人员和实践者做出明智的算法选择。