A Large-scale Benchmark for Log Parsing

Log data is pivotal in activities like anomaly detection and failure diagnosis in the automated maintenance of software systems. Due to their unstructured format, log parsing is often required to transform them into a structured format for automated analysis. A variety of log parsers exist, making it vital to benchmark these tools to comprehend their features and performance. However, existing datasets for log parsing are limited in terms of scale and representativeness, posing challenges for studies that aim to evaluate or develop log parsers. This problem becomes more pronounced when these parsers are evaluated for production use. To address these issues, we introduce a new collection of large-scale annotated log datasets, named LogPub, which more accurately mirrors log data observed in real-world software systems. LogPub comprises 14 datasets, each averaging 3.6 million log lines. Utilizing LogPub, we re-evaluate 15 log parsers in a more rigorous and practical setting. We also propose a new evaluation metric to lessen the sensitivity of current metrics to imbalanced data distribution. Furthermore, we are the first to scrutinize the detailed performance of log parsers on logs that represent rare system events and offer comprehensive information for system troubleshooting. Parsing such logs accurately is vital yet challenging. We believe that our work could shed light on the design and evaluation of log parsers in more realistic settings, thereby facilitating their implementation in production systems.

翻译：日志数据在软件系统自动维护中的异常检测和故障诊断等活动中至关重要。由于日志格式非结构化，通常需要进行日志解析，将其转换为结构化格式以供自动分析。目前存在多种日志解析器，因此对这些工具进行基准测试以了解其特性和性能至关重要。然而，现有的日志解析数据集在规模和代表性方面存在局限性，对旨在评估或开发日志解析器的研究构成了挑战。当这些解析器被评估用于生产环境时，这一问题尤为突出。为解决这些问题，我们引入了一个新的大规模带注释日志数据集集合，命名为LogPub，该集合更准确地反映了真实软件系统中观察到的日志数据。LogPub包含14个数据集，每个数据集平均有360万行日志。利用LogPub，我们在更严格和实用的环境下重新评估了15个日志解析器。我们还提出了一种新的评估指标，以降低当前指标对不均衡数据分布的敏感性。此外，我们是首个详细审视日志解析器在表示罕见系统事件的日志上的表现，并为系统故障排查提供全面信息的研究。准确解析此类日志至关重要且具有挑战性。我们相信，我们的工作能够为在更现实的环境中设计和评估日志解析器提供启示，从而促进其在生产系统中的实施。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日