On the Effectiveness of Log Representation for Log-based Anomaly Detection

Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.

翻译：日志是人们理解软件系统运行状态的重要信息来源。随着现代软件架构与维护方法的演进，越来越多的研究致力于自动化日志分析。特别是在日志分析任务中，机器学习技术得到了广泛应用。在基于机器学习的日志分析任务中，将文本形式的日志数据转换为数值特征向量是不可或缺的关键步骤。然而，不同日志表示技术对下游模型性能的影响尚不明确，这限制了研究人员和从业者在自动化日志分析工作流中选择最优日志表示技术的机会。为此，本研究系统考察并比较了以往日志分析研究中广泛采用的日志表示技术。具体而言，我们选取了六种日志表示技术，在基于日志的异常检测场景中，结合七个机器学习模型与四个公开日志数据集（即HDFS、BGL、Spirit和Thunderbird）进行性能评估。此外，我们还考察了日志解析过程及不同特征聚合方法在与日志表示技术配合使用时的潜在影响。基于实验分析，我们为未来研究人员和开发者在设计自动化日志分析工作流时提供了若干启发性的指导原则。我们相信，对日志表示技术的全面比较将有助于相关从业者深入理解不同技术的特性，并为其基于机器学习的日志分析工作流选择最适配的技术方案提供参考。