EvLog: Identifying Anomalous Logs over Software Evolution

Software logs record system activities, aiding maintainers in identifying the underlying causes for failures and enabling prompt mitigation actions. However, maintainers need to inspect a large volume of daily logs to identify the anomalous logs that reveal failure details for further diagnosis. Thus, how to automatically distinguish these anomalous logs from normal logs becomes a critical problem. Existing approaches alleviate the burden on software maintainers, but they are built upon an improper yet critical assumption: logging statements in the software remain unchanged. While software keeps evolving, our empirical study finds that evolving software brings three challenges: log parsing errors, evolving log events, and unstable log sequences. In this paper, we propose a novel unsupervised approach named Evolving Log analyzer (EvLog) to mitigate these challenges. We first build a multi-level representation extractor to process logs without parsing to prevent errors from the parser. The multi-level representations preserve the essential semantics of logs while leaving out insignificant changes in evolving events. EvLog then implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence. EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively, which outperforms other state-of-the-art approaches by a wide margin. To our best knowledge, this is the first study on localizing anomalous logs over software evolution. We believe our work sheds new light on the impact of software evolution with the corresponding solutions for the log analysis community.

翻译：软件日志记录系统活动，帮助维护人员定位故障的根本原因并采取及时缓解措施。然而，维护人员需要检查大量日常日志，以识别能揭示故障细节的异常日志以便进一步诊断。因此，如何自动区分这些异常日志与正常日志成为了一个关键问题。现有方法减轻了软件维护人员的负担，但它们基于一个不合理的关键假设：软件中的日志语句保持不变。尽管软件持续演化，我们的实证研究发现，软件演化带来了三个挑战：日志解析错误、演化中的日志事件以及不稳定的日志序列。本文提出了一种名为演化日志分析器（EvLog）的新型无监督方法以应对这些挑战。我们首先构建了一个多级表示提取器，无需解析即可处理日志，从而避免解析器引入的错误。多级表示保留了日志的核心语义，同时忽略了演化事件中不重要的变化。随后，EvLog 利用注意力机制实现异常判别器，以识别异常日志并规避不稳定序列带来的问题。EvLog 在两个真实世界系统演化日志数据集上展现出有效性，在版本内设置和版本间设置下的平均F1分数分别为0.955和0.847，大幅优于其他最先进方法。据我们所知，这是首次在软件演化过程中定位异常日志的研究。我们相信，这项工作为日志分析社区中软件演化的影响及相应解决方案提供了新的见解。