Log parsing is a fundamental step for automated log analysis, which transforms raw log messages into structured formats. Existing syntax-based parsers struggle with complex logs because they lack semantic reasoning ability. Emerging LLM-powered semantic parsers achieve high accuracy but suffer from prohibitive latency and token costs because they apply semantic inference across all logs. Our key observation is that not all logs necessitate complex semantic understanding: a vast majority of logs exhibit repetitive patterns that can be extracted via straightforward statistical analysis. Driven by this insight, we propose CelerLog, a fast and effective log parser. CelerLog introduces a dynamic routing mechanism to classify logs into dense and sparse groups. Logs with strong statistical patterns (dense groups) are processed by an efficient statistical processor, whereas the sparse groups lacking such patterns are routed to an LLM for semantic inference. This hybrid strategy avoids unnecessary LLM invocations. Extensive experiments on 14 public datasets show that CelerLog achieves leading performance over state-of-the-art baselines and is 7.9x to 18.6x faster than LLM methods and up to 1.5x faster than Drain. Additionally, it reduces costs by decreasing token consumption by 80.2% - 94.1% and LLM invocations by 86.4% - 90.9%.
翻译:日志解析是自动化日志分析的基础步骤,它将原始日志消息转换为结构化格式。现有的基于语法的解析器因缺乏语义推理能力,在处理复杂日志时表现不佳。新兴的基于大语言模型(LLM)的语义解析器虽能达到高准确率,但由于对所有日志均应用语义推理,导致存在严重延迟和令牌成本问题。我们的关键观察是:并非所有日志都需要复杂的语义理解——绝大多数日志呈现出可通过简单统计分析提取的重复模式。基于这一洞察,我们提出CelerLog——一种快速高效的日志解析器。CelerLog引入动态路由机制,将日志划分为密集组与稀疏组。具有强统计模式的日志(密集组)由高效的统计处理器处理,而缺乏此类模式的稀疏组则被路由至LLM进行语义推理。这种混合策略避免了不必要的LLM调用。在14个公开数据集上的大量实验表明,CelerLog的性能优于当前最先进的基线方法,其速度比基于LLM的方法快7.9倍至18.6倍,比Drain快1.5倍。此外,它通过将令牌消耗降低80.2%-94.1%、LLM调用次数减少86.4%-90.9%来降低成本。