Log-based anomaly detection is critical for ensuring the stability and reliability of web systems. One of the key problems in this task is the lack of sufficient labeled logs, which limits the rapid deployment in new systems. Existing works usually leverage large-scale labeled logs from a mature web system and a small amount of labeled logs from a new system, using transfer learning to extract and generalize general knowledge across both domains. However, these methods focus solely on the transfer of general knowledge and neglect the disparity and potential mismatch between such knowledge and the proprietary knowledge of target system, thus constraining performance. To address this limitation, we propose FusionLog, a novel zero-label cross-system log-based anomaly detection method that effectively achieves the fusion of general and proprietary knowledge, enabling cross-system generalization without any labeled target logs. Specifically, we first design a training-free router based on semantic similarity that dynamically partitions unlabeled target logs into 'general logs' and 'proprietary logs.' For general logs, FusionLog employs a small model based on system-agnostic representation meta-learning for direct training and inference, inheriting the general anomaly patterns shared between the source and target systems. For proprietary logs, we iteratively generate pseudo-labels and fine-tune the small model using multi-round collaborative knowledge distillation and fusion based on large language model (LLM) and small model (SM) to enhance its capability to recognize anomaly patterns specific to the target system. Experimental results on three public log datasets from different systems show that FusionLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming state-of-the-art cross-system log-based anomaly detection methods.
翻译:基于日志的异常检测对于保障网络系统的稳定性与可靠性至关重要。该任务的一个核心难题在于缺乏充足的标注日志,这限制了在新系统中的快速部署。现有研究通常利用成熟网络系统的大规模标注日志以及新系统的少量标注日志,通过迁移学习提取并泛化跨领域的通用知识。然而,这些方法仅关注通用知识的迁移,忽视了此类知识与目标系统专有知识之间的差异及潜在不匹配,从而制约了检测性能。为克服这一局限,本文提出FusionLog,一种新颖的零标注跨系统日志异常检测方法,能够有效实现通用知识与专有知识的融合,从而在无需任何目标系统标注日志的情况下实现跨系统泛化。具体而言,我们首先设计了一种基于语义相似度的免训练路由机制,动态地将未标注的目标日志划分为“通用日志”与“专有日志”。对于通用日志,FusionLog采用基于系统无关表示元学习的小型模型进行直接训练与推断,继承源系统与目标系统之间共享的通用异常模式。对于专有日志,我们基于大语言模型(LLM)与小型模型(SM),通过多轮协同知识蒸馏与融合迭代生成伪标签并微调小型模型,以增强其识别目标系统特有异常模式的能力。在三个来自不同系统的公开日志数据集上的实验结果表明,FusionLog在完全零标注设置下取得了超过90%的F1分数,显著优于当前最先进的跨系统日志异常检测方法。