Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning

Prior studies have demonstrated the effectiveness of Deep Learning (DL) in automated software vulnerability detection. Graph Neural Networks (GNNs) have proven effective in learning the graph representations of source code and are commonly adopted by existing DL-based vulnerability detection methods. However, the existing methods are still limited by the fact that GNNs are essentially difficult to handle the connections between long-distance nodes in a code structure graph. Besides, they do not well exploit the multiple types of edges in a code structure graph (such as edges representing data flow and control flow). Consequently, despite achieving state-of-the-art performance, the existing GNN-based methods tend to fail to capture global information (i.e., long-range dependencies among nodes) of code graphs. To mitigate these issues, in this paper, we propose a novel vulnerability detection framework with grAph siMplification and enhanced graph rePresentation LEarning, named AMPLE. AMPLE mainly contains two parts: 1) graph simplification, which aims at reducing the distances between nodes by shrinking the node sizes of code structure graphs; 2) enhanced graph representation learning, which involves one edge-aware graph convolutional network module for fusing heterogeneous edge information into node representations and one kernel-scaled representation module for well capturing the relations between distant graph nodes. Experiments on three public benchmark datasets show that AMPLE outperforms the state-of-the-art methods by 0.39%-35.32% and 7.64%-199.81% with respect to the accuracy and F1 score metrics, respectively. The results demonstrate the effectiveness of AMPLE in learning global information of code graphs for vulnerability detection.

翻译：先前研究已证实深度学习在自动化软件漏洞检测中的有效性。图神经网络在源代码图表示学习方面展现出显著效果，因而被现有基于深度学习的漏洞检测方法广泛采用。然而，现有方法仍受限于图神经网络本质难以处理代码结构图中长距离节点连接的问题。此外，它们未能充分利用代码结构图中多种类型的边（如表示数据流与控制流的边）。因此，尽管现有基于图神经网络的方法取得了当前最优性能，但仍难以捕获代码图的全局信息（即节点间的长程依赖关系）。为解决上述问题，本文提出一种创新的漏洞检测框架AMPLE，其核心包含图简化与增强图表示学习两个部分：1）图简化模块，通过缩小代码结构图的节点规模来降低节点间距离；2）增强图表示学习模块，包含一个边感知图卷积网络子模块用于将异构边信息融合至节点表示，以及一个核尺度化表示子模块用于有效捕获远距离图节点间的关系。在三个公开基准数据集上的实验表明，AMPLE在准确率和F1分数指标上分别超越现有最优方法0.39%-35.32%和7.64%-199.81%。这些结果证实了AMPLE在捕获代码图全局信息用于漏洞检测方面的有效性。