Log parsing serves as an essential prerequisite for various log analysis tasks. Recent advancements in this field have improved parsing accuracy by leveraging the semantics in logs through fine-tuning large language models (LLMs) or learning from in-context demonstrations. However, these methods heavily depend on labeled examples to achieve optimal performance. In practice, collecting sufficient labeled data is challenging due to the large scale and continuous evolution of logs, leading to performance degradation of existing log parsers after deployment. To address this issue, we propose LUNAR, an unsupervised LLM-based method for efficient and off-the-shelf log parsing. Our key insight is that while LLMs may struggle with direct log parsing, their performance can be significantly enhanced through comparative analysis across multiple logs that differ only in their parameter parts. We refer to such groups of logs as Log Contrastive Units (LCUs). Given the vast volume of logs, obtaining LCUs is difficult. Therefore, LUNAR introduces a hybrid ranking scheme to effectively search for LCUs by jointly considering the commonality and variability among logs. Additionally, LUNAR crafts a novel parsing prompt for LLMs to identify contrastive patterns and extract meaningful log structures from LCUs. Experiments on large-scale public datasets demonstrate that LUNAR significantly outperforms state-of-the-art log parsers in terms of accuracy and efficiency, providing an effective and scalable solution for real-world deployment. \footnote{The code and data are available at \url{https://github.com/Jun-jie-Huang/LUNAR}}.
翻译:日志解析是多种日志分析任务的关键前提。该领域的最新进展通过微调大型语言模型(LLMs)或利用上下文示例学习,借助日志语义提升了解析准确率。然而,这些方法高度依赖标注数据以实现最优性能。在实际应用中,由于日志规模庞大且持续演化,收集充足标注数据极具挑战,导致现有日志解析器部署后性能下降。为解决此问题,我们提出LUNAR——一种基于LLM的无监督方法,可实现高效即用的日志解析。我们的核心洞见在于:虽然LLMs直接解析日志可能存在困难,但通过对仅在参数部分存在差异的多条日志进行对比分析,其性能可获得显著提升。我们将此类日志组定义为日志对比单元(LCUs)。鉴于日志数据量巨大,获取LCUs较为困难。为此,LUNAR提出一种混合排序方案,通过综合考虑日志间的共性与差异性来有效搜索LCUs。此外,LUNAR为LLMs设计了创新的解析提示模板,以识别对比模式并从LCUs中提取有意义的日志结构。在大规模公开数据集上的实验表明,LUNAR在准确率和效率方面显著优于当前最先进的日志解析器,为实际部署提供了高效可扩展的解决方案。\footnote{代码与数据详见 \url{https://github.com/Jun-jie-Huang/LUNAR}}。