Logs play a critical role in providing essential information for system monitoring and troubleshooting. Recently, with the success of pre-trained language models (PLMs) and large language models (LLMs) in natural language processing (NLP), smaller PLMs (such as BERT) and LLMs (like ChatGPT) have become the current mainstream approaches for log analysis. While LLMs possess rich knowledge, their high computational costs and unstable performance make LLMs impractical for analyzing logs directly. In contrast, smaller PLMs can be fine-tuned for specific tasks even with limited computational resources, making them more practical. However, these smaller PLMs face challenges in understanding logs comprehensively due to their limited expert knowledge. To better utilize the knowledge embedded within LLMs for log understanding, this paper introduces a novel knowledge enhancement framework, called LUK, which acquires expert knowledge from LLMs to empower log understanding on a smaller PLM. Specifically, we design a multi-expert collaboration framework based on LLMs consisting of different roles to acquire expert knowledge. In addition, we propose two novel pre-training tasks to enhance the log pre-training with expert knowledge. LUK achieves state-of-the-art results on different log analysis tasks and extensive experiments demonstrate expert knowledge from LLMs can be utilized more effectively to understand logs.
翻译:日志在系统监控与故障排除中发挥着提供关键信息的重要作用。近期,随着预训练语言模型(PLMs)和大型语言模型(LLMs)在自然语言处理(NLP)领域的成功应用,较小规模的PLMs(如BERT)与LLMs(如ChatGPT)已成为当前日志分析的主流方法。尽管LLMs具备丰富的知识储备,但其高昂的计算成本与不稳定的性能使得直接使用LLMs分析日志并不现实。相比之下,较小规模的PLMs即使在有限计算资源下也能针对特定任务进行微调,因而更具实用性。然而,这些较小规模的PLMs受限于专家知识的不足,难以全面理解日志内容。为更有效地利用LLMs中蕴含的知识来增强日志理解能力,本文提出了一种新颖的知识增强框架LUK,该框架从LLMs中获取专家知识,以赋能较小规模的PLM进行日志理解。具体而言,我们设计了一个基于LLMs的多专家协作框架,通过不同角色分工获取专家知识。此外,我们提出了两项新颖的预训练任务,将专家知识融入日志预训练过程。LUK在不同日志分析任务中取得了最先进的性能,大量实验证明LLMs的专家知识能够被更有效地用于日志理解。