Logs generated by large-scale software systems provide crucial information for engineers to understand the system status and diagnose problems of the systems. Log parsing, which converts raw log messages into structured data, is the first step to enabling automated log analytics. Existing log parsers extract the common part as log templates using statistical features. However, these log parsers often fail to identify the correct templates and parameters because: 1) they often overlook the semantic meaning of log messages, and 2) they require domain-specific knowledge for different log datasets. To address the limitations of existing methods, in this paper, we propose LogPPT to capture the patterns of templates using prompt-based few-shot learning. LogPPT utilises a novel prompt tuning method to recognise keywords and parameters based on a few labelled log data. In addition, an adaptive random sampling algorithm is designed to select a small yet diverse training set. We have conducted extensive experiments on 16 public log datasets. The experimental results show that LogPPT is effective and efficient for log parsing.
翻译:大型软件系统生成的日志为工程师理解系统状态和诊断问题提供了关键信息。日志解析将原始日志消息转换为结构化数据,是实现自动化日志分析的第一步。现有日志解析器利用统计特征提取日志公共部分作为日志模板。然而,这些日志解析器常常无法识别正确的模板和参数,原因在于:1) 它们通常忽略日志消息的语义含义,2) 针对不同的日志数据集需要领域特定知识。为解决现有方法的局限性,本文提出LogPPT,利用基于提示的少样本学习捕获模板模式。LogPPT采用一种新颖的提示调优方法,基于少量标注日志数据识别关键词和参数。此外,我们设计了一种自适应随机采样算法,以选取小型但多样化的训练集。我们在16个公开日志数据集上进行了广泛实验。实验结果表明,LogPPT在日志解析方面有效且高效。