The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations.
翻译:深度学习的快速发展激发了人们对增强基于日志的异常检测的兴趣。该方法旨在从日志事件(日志消息模板)中提取含义,并开发用于异常检测的高级深度学习模型。然而,这些深度学习方法面临挑战,如严重依赖训练数据、标签以及因模型复杂性导致的计算资源消耗。相比之下,传统的机器学习与数据挖掘技术对数据依赖性较低,效率更高,但效果不如深度学习。为了使基于日志的异常检测更加实用,目标是增强传统技术以匹配深度学习的有效性。先前在另一个领域(Stack Overflow上的问题链接)的研究表明,优化后的传统技术可与最先进的深度学习方法相媲美。受此概念启发,我们进行了一项实证研究。我们通过结合轻量级的基于语义的日志表示,优化了无监督的主成分分析(PCA)这一传统技术,解决了训练数据中未见日志事件的问题,从而增强了日志表示。本研究比较了七种基于日志的异常检测方法,包括四种基于深度学习、两种传统方法以及优化的PCA技术,并使用了公共和工业数据集。结果表明,优化的无监督PCA技术在效果上可与先进的监督/半监督深度学习方法相媲美,同时在有限训练数据下表现更稳定且资源效率更高。这证明了传统技术通过微小但有效的调整所具有的适应性和优势。