This study explores the effectiveness of graph neural networks (GNNs) for vulnerability detection in software code, utilizing a real-world dataset of Java vulnerability-fixing commits. The dataset's structure, based on the number of modified methods in each commit, offers a natural partition that facilitates diverse investigative scenarios. The primary focus is to evaluate the general applicability of GNNs in identifying vulnerable code segments and distinguishing these from their fixed versions, as well as from random non-vulnerable code. Through a series of experiments, the research addresses key questions about the suitability of different configurations and subsets of data in enhancing the prediction accuracy of GNN models. Experiments indicate that certain model configurations, such as the pruning of specific graph elements and the exclusion of certain types of code representation, significantly improve performance. Additionally, the study highlights the importance of including random data in training to optimize the detection capabilities of GNNs.
翻译:本研究探讨了图神经网络(GNNs)在软件代码漏洞检测中的有效性,利用了一个真实的Java漏洞修复提交数据集。该数据集的结构基于每次提交中修改的方法数量,提供了一个自然的划分,便于进行多样化的研究场景。主要关注点在于评估GNNs在识别易受攻击的代码段、并将其与修复后的版本以及随机的非易受攻击代码区分开来方面的普遍适用性。通过一系列实验,本研究探讨了不同配置和数据子集在提升GNN模型预测准确性方面的适用性等关键问题。实验表明,某些模型配置,例如修剪特定的图元素以及排除某些类型的代码表示,能显著提高性能。此外,研究还强调了在训练中包含随机数据对于优化GNNs检测能力的重要性。