Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue. In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to avoid spurious explanations; and 2) provide both concise and effective explanations to reason about the detected vulnerabilities. \sysname consists of two core parts referred to as Trainer and Explainer. The former aims to train a detection model which is robust to random perturbation based on combinatorial contrastive learning, while the latter builds an explainer to derive crucial code statements that are most decisive to the detected vulnerability via dual-view causal inference as explanations. We apply Coca over three typical GNN-based vulnerability detectors. Experimental results show that Coca can effectively mitigate the spurious correlation issue, and provide more useful high-quality explanations.
翻译:近期,基于图神经网络(GNN)的漏洞检测系统取得了显著成功。然而,缺乏可解释性对在安全相关领域部署黑盒模型构成了关键挑战。为此,已有多种方法通过提供一组对模型预测有正向贡献的关键语句来解释检测模型的决策逻辑。不幸的是,由于检测模型鲁棒性弱及解释策略欠优,这些方法存在揭示虚假关联和冗余问题的风险。本文提出Coca这一通用框架,旨在:1)增强现有基于GNN的漏洞检测模型的鲁棒性,以避免虚假解释;2)提供既简洁又有效的解释,用于推理检测到的漏洞。Coca由两个核心部分组成,分别称为训练器(Trainer)和解释器(Explainer)。前者旨在基于组合对比学习训练对随机扰动鲁棒的检测模型,后者通过双视角因果推理构建解释器,提取对检测到的漏洞最具决定性的关键代码语句作为解释。我们将Coca应用于三种典型的基于GNN的漏洞检测器。实验结果表明,Coca能够有效缓解虚假关联问题,并提供更高质量的有用解释。