Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, self-learning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is non-trivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.
翻译:自动日志文件分析能够实现系统故障等相关事件的早期检测。特别是,自学习的异常检测技术能够捕获日志数据中的模式,进而向系统运维人员报告意外的日志事件发生,而无需预先提供或手动建模异常场景。近年来,越来越多的基于深度学习神经网络的方法被提出用于这一目的。这些方法在检测性能上优于传统的机器学习技术,同时解决了数据格式不稳定的问题。然而,深度学习存在多种不同的架构,且对原始的、非结构化的日志数据进行编码以供神经网络分析并非易事。因此,我们进行了一项系统性文献综述,概述了已应用的模型、数据预处理机制、异常检测技术以及评估方法。本综述并未对现有方法进行定量比较,而是旨在帮助读者理解不同模型架构的相关方面,并强调未来工作中存在的开放性问题。