Semi-supervised learning via DQN for log anomaly detection

Log anomaly detection is a critical component in modern software system security and maintenance, serving as a crucial support and basis for system monitoring, operation, and troubleshooting. It aids operations personnel in timely identification and resolution of issues. However, current methods in log anomaly detection still face challenges such as underutilization of unlabeled data, imbalance between normal and anomaly class data, and high rates of false positives and false negatives, leading to insufficient effectiveness in anomaly recognition. In this study, we propose a semi-supervised log anomaly detection method named DQNLog, which integrates deep reinforcement learning to enhance anomaly detection performance by leveraging a small amount of labeled data and large-scale unlabeled data. To address issues of imbalanced data and insufficient labeling, we design a state transition function biased towards anomalies based on cosine similarity, aiming to capture semantic-similar anomalies rather than favoring the majority class. To enhance the model's capability in learning anomalies, we devise a joint reward function that encourages the model to utilize labeled anomalies and explore unlabeled anomalies, thereby reducing false positives and false negatives. Additionally, to prevent the model from deviating from normal trajectories due to misestimation, we introduce a regularization term in the loss function to ensure the model retains prior knowledge during updates. We evaluate DQNLog on three widely used datasets, demonstrating its ability to effectively utilize large-scale unlabeled data and achieve promising results across all experimental datasets.

翻译：日志异常检测是现代软件系统安全与维护的关键组成部分，为系统监控、运维和故障排查提供重要支撑与依据，有助于运维人员及时发现并解决问题。然而，当前日志异常检测方法仍面临未标记数据利用不足、正常与异常类别数据不平衡、误报率与漏报率较高等挑战，导致异常识别效果不足。本研究提出一种名为DQNLog的半监督日志异常检测方法，该方法融合深度强化学习，通过利用少量标记数据与大规模未标记数据来提升异常检测性能。针对数据不平衡与标记不足的问题，我们基于余弦相似度设计了一种偏向异常的状态转移函数，旨在捕捉语义相似的异常而非偏向多数类别。为增强模型学习异常的能力，我们设计了一种联合奖励函数，激励模型利用标记异常并探索未标记异常，从而降低误报与漏报。此外，为防止模型因误估计偏离正常轨迹，我们在损失函数中引入正则化项，确保模型在更新过程中保留先验知识。我们在三个广泛使用的数据集上评估DQNLog，结果表明其能够有效利用大规模未标记数据，并在所有实验数据集上取得良好效果。