Software vulnerability detection is crucial for high-quality software development. Recently, some studies utilizing Graph Neural Networks (GNNs) to learn the graph representation of code in vulnerability detection tasks have achieved remarkable success. However, existing graph-based approaches mainly face two limitations that prevent them from generalizing well to large code graphs: (1) the interference of noise information in the code graph; (2) the difficulty in capturing long-distance dependencies within the graph. To mitigate these problems, we propose a novel vulnerability detection method, ANGLE, whose novelty mainly embodies the hierarchical graph refinement and context-aware graph representation learning. The former hierarchically filters redundant information in the code graph, thereby reducing the size of the graph, while the latter collaboratively employs the Graph Transformer and GNN to learn code graph representations from both the global and local perspectives, thus capturing long-distance dependencies. Extensive experiments demonstrate promising results on three widely used benchmark datasets: our method significantly outperforms several other baselines in terms of the accuracy and F1 score. Particularly, in large code graphs, ANGLE achieves an improvement in accuracy of 34.27%-161.93% compared to the state-of-the-art method, AMPLE. Such results demonstrate the effectiveness of ANGLE in vulnerability detection tasks.
翻译:软件漏洞检测对于高质量软件开发至关重要。近期,一些利用图神经网络(GNN)学习代码图表示以进行漏洞检测的研究取得了显著成功。然而,现有基于图的方法主要面临两个限制,阻碍了其在大规模代码图上的良好泛化能力:(1)代码图中噪声信息的干扰;(2)图中长距离依赖关系难以捕捉。为缓解这些问题,我们提出了一种新颖的漏洞检测方法ANGLE,其创新性主要体现在分层图精化与上下文感知的图表示学习两方面。前者通过分层过滤代码图中的冗余信息,从而减小图的规模;后者协同利用Graph Transformer与GNN,从全局和局部双重视角学习代码图表示,进而捕获长距离依赖关系。大量实验在三个广泛使用的基准数据集上取得了显著成果:我们的方法在准确率和F1分数上均显著优于其他基线方法。特别地,在大规模代码图中,相较于最先进的方法AMPLE,ANGLE在准确率上实现了34.27%-161.93%的提升。这些结果充分证明了ANGLE在漏洞检测任务中的有效性。