Small is Beautiful: A Practical and Efficient Log Parsing Framework

Log parsing is a fundamental step in log analysis, partitioning raw logs into constant templates and dynamic variables. While recent semantic-based parsers leveraging Large Language Models (LLMs) exhibit superior generalizability over traditional syntax-based methods, their effectiveness is heavily contingent on model scale. This dependency leads to significant performance collapse when employing smaller, more resource-efficient LLMs. Such degradation creates a major barrier to real-world adoption, where data privacy requirements and computational constraints necessitate the use of succinct models. To bridge this gap, we propose EFParser, an unsupervised LLM-based log parser designed to enhance the capabilities of smaller models through systematic architectural innovation. EFParser introduces a dual-cache system with an adaptive updating mechanism that distinguishes between novel patterns and variations of existing templates. This allows the parser to merge redundant templates and rectify prior errors, maintaining cache consistency. Furthermore, a dedicated correction module acts as a gatekeeper, validating and refining every LLM-generated template before caching to prevent error injection. Empirical evaluations on public large-scale datasets demonstrate that EFParser outperforms state-of-the-art baselines by an average of 12.5% across all metrics when running on smaller LLMs, even surpassing some baselines utilizing large-scale models. Despite its additional validation steps, EFParser maintains high computational efficiency, offering a robust and practical solution for real-world log analysis deployment.

翻译：日志解析是日志分析的基础步骤，它将原始日志划分为恒定模板和动态变量。尽管近期基于语义的解析器利用大型语言模型（LLMs）展现出优于传统基于语法方法的泛化能力，但其有效性高度依赖于模型规模。这种依赖性导致在使用更小、更资源高效的LLMs时出现显著的性能下降。此类性能退化构成了实际应用中的主要障碍，因为数据隐私要求和计算约束需要使用精简模型。为弥合这一差距，我们提出了EFParser——一种基于LLMs的无监督日志解析器，旨在通过系统性的架构创新增强较小模型的能力。EFParser引入了具有自适应更新机制的双缓存系统，能够区分新颖模式与现有模板的变体。这使得解析器能够合并冗余模板并修正先前的错误，从而保持缓存一致性。此外，专用的校正模块充当看门人角色，在缓存前验证并优化每个LLM生成的模板，以防止错误注入。在公开大规模数据集上的实证评估表明，当在较小LLMs上运行时，EFParser在所有指标上平均优于最先进的基线方法12.5%，甚至超越了部分使用大规模模型的基线。尽管增加了验证步骤，EFParser仍保持较高的计算效率，为实际日志分析部署提供了稳健且实用的解决方案。