Growth in system complexity increases the need for automated log analysis techniques, such as Log-based Anomaly Detection (LAD). While deep learning (DL) methods have been widely used for LAD, traditional machine learning (ML) techniques can also perform well depending on the context and dataset. Semi-supervised techniques deserve the same attention as they offer practical advantages over fully supervised methods. Current evaluations mainly focus on detection accuracy, but this alone is insufficient to determine the suitability of a technique for a given LAD task. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. This paper presents a comprehensive empirical study evaluating a wide range of supervised and semi-supervised, traditional and deep ML techniques across four criteria: detection accuracy, time performance, and sensitivity to hyperparameter tuning in both detection accuracy and time performance. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time on most of the benchmark datasets considered in our study. Moreover, overall, sensitivity analysis to hyperparameter tuning with respect to detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.
翻译:系统复杂性的增长增加了对自动化日志分析技术(如基于日志的异常检测)的需求。尽管深度学习方法已广泛用于基于日志的异常检测,但传统机器学习技术根据具体场景和数据集也能表现良好。半监督技术值得同等关注,因其相较于全监督方法具有实际优势。当前评估主要聚焦于检测准确率,但仅凭此不足以判定某项技术对特定基于日志的异常检测任务的适用性。其他需考量的方面包括训练与预测时间,以及对超参数调优的敏感性——这些在实践中对工程师至关重要。本文通过综合实证研究,从四个维度评估了广泛的监督与半监督、传统与深度机器学习技术:检测准确率、时间性能,以及在检测准确率和时间性能两方面对超参数调优的敏感性。实验结果表明,在本研究涉及的大多数基准数据集上,监督式传统与深度机器学习技术在检测准确率和预测时间方面表现相近。此外,总体而言,针对检测准确率的超参数调优敏感性分析显示,监督式传统机器学习技术比深度学习技术的敏感性更低。进一步研究发现,半监督技术产生的检测准确率显著低于监督技术。