Log data is pivotal in activities like anomaly detection and failure diagnosis in the automated maintenance of software systems. Due to their unstructured format, log parsing is often required to transform them into a structured format for automated analysis. A variety of log parsers exist, making it vital to benchmark these tools to comprehend their features and performance. However, existing datasets for log parsing are limited in terms of scale and representativeness, posing challenges for studies that aim to evaluate or develop log parsers. This problem becomes more pronounced when these parsers are evaluated for production use. To address these issues, we introduce a new collection of large-scale annotated log datasets, named LogPub, which more accurately mirrors log data observed in real-world software systems. LogPub comprises 14 datasets, each averaging 3.6 million log lines. Utilizing LogPub, we re-evaluate 15 log parsers in a more rigorous and practical setting. We also propose a new evaluation metric to lessen the sensitivity of current metrics to imbalanced data distribution. Furthermore, we are the first to scrutinize the detailed performance of log parsers on logs that represent rare system events and offer comprehensive information for system troubleshooting. Parsing such logs accurately is vital yet challenging. We believe that our work could shed light on the design and evaluation of log parsers in more realistic settings, thereby facilitating their implementation in production systems.
翻译:日志数据在软件系统自动维护中的异常检测和故障诊断等活动中至关重要。由于日志格式非结构化,通常需要进行日志解析,将其转换为结构化格式以供自动分析。目前存在多种日志解析器,因此对这些工具进行基准测试以了解其特性和性能至关重要。然而,现有的日志解析数据集在规模和代表性方面存在局限性,对旨在评估或开发日志解析器的研究构成了挑战。当这些解析器被评估用于生产环境时,这一问题尤为突出。为解决这些问题,我们引入了一个新的大规模带注释日志数据集集合,命名为LogPub,该集合更准确地反映了真实软件系统中观察到的日志数据。LogPub包含14个数据集,每个数据集平均有360万行日志。利用LogPub,我们在更严格和实用的环境下重新评估了15个日志解析器。我们还提出了一种新的评估指标,以降低当前指标对不均衡数据分布的敏感性。此外,我们是首个详细审视日志解析器在表示罕见系统事件的日志上的表现,并为系统故障排查提供全面信息的研究。准确解析此类日志至关重要且具有挑战性。我们相信,我们的工作能够为在更现实的环境中设计和评估日志解析器提供启示,从而促进其在生产系统中的实施。