Log parsing, the initial and vital stage in automated log analysis, involves extracting log templates from semi-structured logs to generate structured logs. Nonetheless, current log parsers are limited in effectiveness due to two primary reasons. Firstly, traditional data-driven log parsers heavily rely on heuristics or manually crafted features provided by domain experts, which may not consistently yield optimal performance when applied to diverse log systems. Secondly, existing deep learning-based log parsers necessitate model tuning, which is typically confined to training samples and leads to suboptimal performance across the entire log source. To overcome these limitations, we propose a precise log parsing framework named LogDiv, which leverages the in-context inference capability of large language models. Specifically, LogDiv extracts the hidden semantics from multiple log examples through prompt demonstrations. Without the need for model tuning, LogDiv can directly generate a log template for the target log message by leveraging the semantics provided in the prompt context. Additionally, we introduce a simple yet effective prompt format for extracting the output and enhancing the quality of the generated log templates. To validate the performance of LogDiv, we conducted experiments using 16 widely-used public datasets. The results show that LogDiv achieves state-of-the-art performance with an average parsing accuracy of 97.7%, precision template accuracy of 88.1%, and recall template accuracy of 90.8%.
翻译:日志解析是自动化日志分析中初始且至关重要的阶段,其涉及从半结构化的日志中提取日志模板以生成结构化日志。然而,受两大主要原因影响,现有日志解析器的有效性受限。第一,传统数据驱动的日志解析器高度依赖领域专家提供的启发式规则或手动构建特征,在处理多样化日志系统时可能无法持续获得最优性能。第二,现有基于深度学习的日志解析器需要进行模型调优,而这种调优通常局限于训练样本,导致在全部日志源上表现欠佳。为克服上述局限,我们提出了一种名为LogDiv的高精度日志解析框架,该框架利用大型语言模型的上下文推理能力。具体而言,LogDiv通过提示示例从多个日志样本中提取隐含语义。在不需模型调优的情况下,LogDiv可直接利用提示上下文提供的语义为目标日志消息生成日志模板。此外,我们引入了一种简洁而有效的提示格式,用于提取输出并提升生成日志模板的质量。为验证LogDiv的性能,我们使用16个广泛使用的公共数据集进行了实验。结果表明,LogDiv实现了平均解析准确率97.7%、模板精确率88.1%及模板召回率90.8%的最优性能。