Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs based on their root causes. In this paper, we propose to combine such causal-based vulnerability detection algorithms with deep learning, aiming to achieve more efficient and effective vulnerability detection. Specifically, we designed DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show that DeepDFA is both performant and efficient. DeepDFA outperformed all non-transformer baselines. It was trained in 9 minutes, 75x faster than the highest-performing baseline model. When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset. DeepDFA also generalized to real-world vulnerabilities in DbgBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions, while the highest-performing baseline models did not detect any vulnerabilities. By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset with 96.46 F1 score, 97.82 precision, and 95.14 recall. Our replication package is located at https://doi.org/10.6084/m9.figshare.21225413 .

翻译：基于深度学习的漏洞检测已展现出卓越性能，部分研究甚至超越了静态分析工具。然而，当前最先进的方法多采用基于词元的Transformer模型，这类模型在捕捉漏洞检测所需的代码语义方面效率不足。传统程序分析技术（如数据流分析）能够依据缺陷根因检测多种类型的错误。本文提出将此类基于因果关系的漏洞检测算法与深度学习相结合，旨在实现更高效、更精准的漏洞检测。具体而言，我们设计了DeepDFA——一种受数据流分析启发的图学习框架及嵌入技术，使图学习能够模拟数据流计算过程。实验表明，DeepDFA兼具高性能与高效率：该模型在所有非Transformer基线方法中表现最优，训练仅需9分钟，速度较最佳基线模型提升75倍；仅使用50余个脆弱样本及数百个总样本作为训练数据时，其性能与使用完整数据集相当。在DbgBench真实漏洞检测中，DeepDFA平均每折检测出8.7个（共17个）漏洞，并能有效区分补丁版本与缺陷版本，而最佳基线模型未能检测出任何漏洞。将DeepDFA与大语言模型结合后，我们在Big-Vul数据集上以96.46的F1分数、97.82的精确率和95.14的召回率刷新了最先进的漏洞检测性能。完整复现包发布在https://doi.org/10.6084/m9.figshare.21225413。