VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection

Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, developers routinely engage with program analysis to detect vulnerabilities that span multiple functions within repositories. In addition, the widely-used benchmark datasets generally contain only intra-procedural vulnerabilities, leaving the assessment of inter-procedural vulnerability detection capabilities unexplored. To mitigate the issues, we propose a repository-level evaluation system, named \textbf{VulEval}, aiming at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. Specifically, VulEval consists of three interconnected evaluation tasks: \textbf{(1) Function-Level Vulnerability Detection}, aiming at detecting intra-procedural vulnerability given a code snippet; \textbf{(2) Vulnerability-Related Dependency Prediction}, aiming at retrieving the most relevant dependencies from call graphs for providing developers with explanations about the vulnerabilities; and \textbf{(3) Repository-Level Vulnerability Detection}, aiming at detecting inter-procedural vulnerabilities by combining with the dependencies identified in the second task. VulEval also consists of a large-scale dataset, with a total of 4,196 CVE entries, 232,239 functions, and corresponding 4,699 repository-level source code in C/C++ programming languages. Our analysis highlights the current progress and future directions for software vulnerability detection.

翻译：基于深度学习的方法已被证明在软件漏洞检测中有效，并有可能显著提升检测效率。当前方法主要聚焦于单函数检测（即过程内漏洞），忽略了实践中更为复杂的过程间漏洞检测场景。例如，开发人员通常需要借助程序分析来检测仓库中跨多个函数的漏洞。此外，广泛使用的基准数据集通常仅包含过程内漏洞，导致过程间漏洞检测能力的评估尚未得到探索。为解决这些问题，我们提出了一个名为**VulEval**的仓库级评估系统，旨在同时评估过程间与过程内漏洞的检测性能。具体而言，VulEval包含三个相互关联的评估任务：**（1）函数级漏洞检测**，旨在检测给定代码片段中的过程内漏洞；**（2）漏洞相关依赖预测**，旨在从调用图中检索最相关的依赖关系，为开发者提供漏洞解释；**（3）仓库级漏洞检测**，通过结合第二项任务识别出的依赖关系，检测过程间漏洞。VulEval还包含一个大规模数据集，总计包含4,196个CVE条目、232,239个函数以及对应的4,699个C/C++编程语言的仓库级源代码。我们的分析揭示了软件漏洞检测的当前进展与未来方向。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日