Software logs record system activities, aiding maintainers in identifying the underlying causes for failures and enabling prompt mitigation actions. However, maintainers need to inspect a large volume of daily logs to identify the anomalous logs that reveal failure details for further diagnosis. Thus, how to automatically distinguish these anomalous logs from normal logs becomes a critical problem. Existing approaches alleviate the burden on software maintainers, but they are built upon an improper yet critical assumption: logging statements in the software remain unchanged. While software keeps evolving, our empirical study finds that evolving software brings three challenges: log parsing errors, evolving log events, and unstable log sequences. In this paper, we propose a novel unsupervised approach named Evolving Log analyzer (EvLog) to mitigate these challenges. We first build a multi-level representation extractor to process logs without parsing to prevent errors from the parser. The multi-level representations preserve the essential semantics of logs while leaving out insignificant changes in evolving events. EvLog then implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence. EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively, which outperforms other state-of-the-art approaches by a wide margin. To our best knowledge, this is the first study on tackling anomalous logs over software evolution. We believe our work sheds new light on the impact of software evolution with the corresponding solutions for the log analysis community.
翻译:软件日志记录了系统活动,帮助维护人员识别故障的根本原因并采取及时的缓解措施。然而,维护人员需要检查大量日常日志,以识别那些揭示故障细节以进行进一步诊断的异常日志。因此,如何自动区分这些异常日志与正常日志成为一个关键问题。现有方法减轻了软件维护人员的负担,但它们基于一个不恰当但关键的假设:软件中的日志语句保持不变。然而,软件不断演化,我们的实证研究发现,演化中的软件带来了三个挑战:日志解析错误、演化日志事件和不稳定的日志序列。本文提出了一种名为演化日志分析器(EvLog)的新型无监督方法,以缓解这些挑战。我们首先构建了一个多级表示提取器,在不解析的情况下处理日志,以避免解析器产生的错误。多级表示保留了日志的基本语义,同时忽略了演化事件中的非重要变化。随后,EvLog实现了一个带有注意力机制的异常判别器,用于识别异常日志并避免不稳定序列带来的问题。EvLog在两个真实系统的演化日志数据集中表现出了有效性,在版本内设置和版本间设置下的平均F1分数分别为0.955和0.847,显著优于其他最先进的方法。据我们所知,这是首个研究软件演化中异常日志问题的成果。我们相信,我们的工作为日志分析社区揭示了软件演化的影响及其相应解决方案的新视角。