Though many deep learning (DL)-based vulnerability detection approaches have been proposed and indeed achieved remarkable performance, they still have limitations in the generalization as well as the practical usage. More precisely, existing DL-based approaches (1) perform negatively on prediction tasks among functions that are lexically similar but have contrary semantics; (2) provide no intuitive developer-oriented explanations to the detected results. In this paper, we propose a novel approach named SVulD, a function-level Subtle semantic embedding for Vulnerability Detection along with intuitive explanations, to alleviate the above limitations. Specifically, SVulD firstly trains a model to learn distinguishing semantic representations of functions regardless of their lexical similarity. Then, for the detected vulnerable functions, SVulD provides natural language explanations (e.g., root cause) of results to help developers intuitively understand the vulnerabilities. To evaluate the effectiveness of SVulD, we conduct large-scale experiments on a widely used practical vulnerability dataset and compare it with four state-of-the-art (SOTA) approaches by considering five performance measures. The experimental results indicate that SVulD outperforms all SOTAs with a substantial improvement (i.e., 23.5%-68.0% in terms of F1-score, 15.9%-134.8% in terms of PR-AUC and 7.4%-64.4% in terms of Accuracy). Besides, we conduct a user-case study to evaluate the usefulness of SVulD for developers on understanding the vulnerable code and the participants' feedback demonstrates that SVulD is helpful for development practice.
翻译:尽管已提出许多基于深度学习的漏洞检测方法并取得了显著性能,但这些方法在泛化能力和实际应用方面仍存在局限性。具体而言,现有深度学习方法(1)在预测词法相似但语义相反的函数时表现不佳;(2)未对检测结果提供直观的开发者导向性解释。本文提出一种新颖方法——面向函数级漏洞检测的细微语义嵌入与直观解释(SVulD),以缓解上述局限。首先,SVulD训练模型学习区分函数语义表示(无论其词法相似性如何)。其次,针对检测出的脆弱函数,SVulD提供结果的自然语言解释(如根因),帮助开发者直观理解漏洞。为评估SVulD的有效性,我们在广泛使用的实际漏洞数据集上开展大规模实验,并与四种最先进方法在五项性能指标上进行比较。实验结果表明,SVulD在所有最先进方法中均实现显著提升(F1分数提升23.5%-68.0%,PR-AUC提升15.9%-134.8%,准确率提升7.4%-64.4%)。此外,我们通过用户案例研究评估SVulD对开发者理解脆弱代码的实用性,参与者反馈表明SVulD有助于开发实践。