Log-based anomaly detection is critical for ensuring the stability and reliability of web systems. One of the key problems in this task is the lack of sufficient labeled logs, which limits the rapid deployment in new systems. Existing works usually leverage large-scale labeled logs from a mature web system and a small amount of labeled logs from a new system, using transfer learning to extract and generalize general knowledge across both domains. However, these methods focus solely on the transfer of general knowledge and neglect the disparity and potential mismatch between such knowledge and the proprietary knowledge of target system, thus constraining performance. To address this limitation, we propose FusionLog, a novel zero-label cross-system log-based anomaly detection method that effectively achieves the fusion of general and proprietary knowledge, enabling cross-system generalization without any labeled target logs. Specifically, we first design a training-free router based on semantic similarity that dynamically partitions unlabeled target logs into 'general logs' and 'proprietary logs.' For general logs, FusionLog employs a small model based on system-agnostic representation meta-learning for direct training and inference, inheriting the general anomaly patterns shared between the source and target systems. For proprietary logs, we iteratively generate pseudo-labels and fine-tune the small model using multi-round collaborative knowledge distillation and fusion based on large language model (LLM) and small model (SM) to enhance its capability to recognize anomaly patterns specific to the target system. Experimental results on three public log datasets from different systems show that FusionLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming state-of-the-art cross-system log-based anomaly detection methods.
翻译:日志异常检测是确保Web系统稳定性和可靠性的关键。该类任务的主要问题之一是缺乏足够的标记日志,这限制了其在新型系统中的快速部署。现有工作通常利用来自成熟Web系统的大规模标记日志和来自新系统的小量标记日志,通过迁移学习提取并泛化跨域的通用知识。然而,这些方法仅关注通用知识的迁移,忽视了该知识与目标系统专有知识之间的差异及潜在不匹配,从而制约了性能。为克服这一局限,我们提出FusionLog,一种新颖的零标签跨系统日志异常检测方法,该方法有效实现了通用知识与专有知识的融合,可在无需任何目标系统标记日志的情况下实现跨系统泛化。具体而言,我们首先设计了一种基于语义相似性的免训练路由器,将未标记的目标日志动态划分为“通用日志”和“专有日志”。对于通用日志,FusionLog采用基于系统无关表示元学习的小模型进行直接训练与推理,继承源系统与目标系统之间共有的通用异常模式。对于专有日志,我们通过基于大语言模型(LLM)与小模型(SM)的多轮协同知识蒸馏与融合,迭代生成伪标签并对小模型进行微调,增强其识别目标系统特有异常模式的能力。在来自不同系统的三个公开日志数据集上的实验结果表明,FusionLog在完全零标签设置下取得了超过90%的F1分数,显著优于现有最先进的跨系统日志异常检测方法。