Real-time log analysis is the cornerstone of observability for modern infrastructure. However, existing online parsers are architecturally unsuited for the dynamism of production environments. Built on fundamentally static template models, they are dangerously brittle: minor schema drifts silently break parsing pipelines, leading to lost alerts and operational toil. We propose \textbf{KELP} (\textbf{K}elp \textbf{E}volutionary \textbf{L}og \textbf{P}arser), a high-throughput parser built on a novel data structure: the Evolutionary Grouping Tree. Unlike heuristic approaches that rely on fixed rules, KELP treats template discovery as a continuous online clustering process. As logs arrive, the tree structure evolves, nodes split, merge, and re-evaluate roots based on changing frequency distributions. Validating this adaptability requires a dataset that models realistic production complexity, yet we identify that standard benchmarks rely on static, regex-based ground truths that fail to reflect this. To enable rigorous evaluation, we introduce a new benchmark designed to reflect the structural ambiguity of modern production systems. Our evaluation demonstrates that KELP maintains high accuracy on this rigorous dataset where traditional heuristic methods fail, without compromising throughput. Our code and dataset can be found at codeberg.org/stonebucklabs/kelp
翻译:实时日志分析是现代基础设施可观测性的基石。然而,现有的在线解析器在架构上无法适应生产环境的动态性。这些解析器构建于本质上静态的模板模型之上,具有危险的脆弱性:微小的模式漂移会悄无声息地破坏解析流程,导致告警丢失和运维负担。我们提出 \textbf{KELP}(\textbf{K}elp \textbf{E}volutionary \textbf{L}og \textbf{P}arser),一种基于新型数据结构——进化分组树的高吞吐量解析器。与依赖固定规则的启发式方法不同,KELP 将模板发现视为一个持续的在线聚类过程。随着日志的到达,树结构不断进化,节点根据变化的频率分布进行分裂、合并和根节点重评估。验证这种适应性需要一个能够模拟真实生产环境复杂性的数据集,但我们发现标准基准测试依赖于静态的、基于正则表达式的真实标注,未能反映这种复杂性。为了支持严谨的评估,我们引入了一个旨在反映现代生产系统结构模糊性的新基准。我们的评估表明,KELP 在此传统启发式方法失效的严格数据集上保持了高准确率,且未牺牲吞吐量。我们的代码和数据集可在 codeberg.org/stonebucklabs/kelp 找到。