Graph Neural Networks based Log Anomaly Detection and Explanation

Event logs are widely used to record the status of high-tech systems, making log anomaly detection important for monitoring those systems. Most existing log anomaly detection methods take a log event count matrix or log event sequences as input, exploiting quantitative and/or sequential relationships between log events to detect anomalies. Unfortunately, only considering quantitative or sequential relationships may result in low detection accuracy. To alleviate this problem, we propose a graph-based method for unsupervised log anomaly detection, dubbed Logs2Graphs, which first converts event logs into attributed, directed, and weighted graphs, and then leverages graph neural networks to perform graph-level anomaly detection. Specifically, we introduce One-Class Digraph Inception Convolutional Networks, abbreviated as OCDiGCN, a novel graph neural network model for detecting graph-level anomalies in a collection of attributed, directed, and weighted graphs. By coupling the graph representation and anomaly detection steps, OCDiGCN can learn a representation that is especially suited for anomaly detection, resulting in a high detection accuracy. Importantly, for each identified anomaly, we additionally provide a small subset of nodes that play a crucial role in OCDiGCN's prediction as explanations, which can offer valuable cues for subsequent root cause diagnosis. Experiments on five benchmark datasets show that Logs2Graphs performs at least on par with state-of-the-art log anomaly detection methods on simple datasets while largely outperforming state-of-the-art log anomaly detection methods on complicated datasets.

翻译：事件日志被广泛用于记录高科技系统的运行状态，因此日志异常检测对于监控这些系统具有重要意义。现有的大多数日志异常检测方法以日志事件计数矩阵或日志事件序列作为输入，利用日志事件之间的数量关系和/或顺序关系来检测异常。然而，仅考虑数量关系或顺序关系可能导致检测精度较低。为解决这一问题，我们提出了一种基于图的非监督日志异常检测方法，命名为Logs2Graphs，该方法首先将事件日志转换为带属性、有向且带权重的图，随后利用图神经网络进行图级别的异常检测。具体而言，我们引入了一类有向图初始卷积网络（One-Class Digraph Inception Convolutional Networks，简称OCDiGCN），这是一种新颖的图神经网络模型，用于在一组带属性、有向且带权重的图中检测图级异常。通过将图表示与异常检测步骤相结合，OCDiGCN能够学习一种特别适用于异常检测的表示，从而实现高检测精度。更重要的是，对于每一个识别出的异常，我们还额外提供了一小部分在OCDiGCN预测中起关键作用的节点作为解释，这可为后续的根因诊断提供有价值的线索。在五个基准数据集上的实验表明，Logs2Graphs在简单数据集上的表现至少与最先进的日志异常检测方法相当，而在复杂数据集上则大幅超越最先进的日志异常检测方法。

相关内容

异常检测

关注 0

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日