Log Parsing Evaluation in the Era of Modern Software Systems

Due to the complexity and size of modern software systems, the amount of logs generated is tremendous. Hence, it is infeasible to manually investigate these data in a reasonable time, thereby requiring automating log analysis to derive insights about the functioning of the systems. Motivated by an industry use-case, we zoom-in on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We show this by assessing the 14 most-recognized log parsing approaches in the literature using (i) nine publicly available datasets, (ii) one dataset comprised of combined publicly available data, and (iii) one dataset generated within the infrastructure of a large bank. Subsequently, toward improving log parsing robustness in real-world production scenarios, we propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts through generating synthetic log data that resemble industry logs. Our contributions serve as a foundation to consolidate past research efforts, facilitate future research advancements, and establish a strong link between research and industry log parsing.

翻译：鉴于现代软件系统的复杂性与规模，系统运行时产生的日志量极其庞大。因此，在合理时间内人工分析这些数据已不可行，必须借助自动化日志分析来获取系统运行状况的洞察。受工业用例驱动，我们聚焦于自动化日志分析的核心环节——日志解析，这是从日志中提取任何洞察的前提。我们的研究揭示了日志解析领域存在的问题，特别是其处理异构真实世界日志的低效性。我们通过以下方式评估了文献中14种最受认可的日志解析方法：(i) 九个公开数据集，(ii) 一个由公开数据组合而成的数据集，以及(iii) 一个在大型银行基础设施中生成的数据集。随后，为提升日志解析在真实生产场景中的鲁棒性，我们提出Logchimera工具，该工具通过生成模拟工业日志的合成日志数据，能够评估工业环境下的日志解析性能。我们的贡献为整合既往研究成果、促进未来研究进展、以及建立研究与工业日志解析之间的强纽带奠定了基础。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日