Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue. In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to avoid spurious explanations; and 2) provide both concise and effective explanations to reason about the detected vulnerabilities. \sysname consists of two core parts referred to as Trainer and Explainer. The former aims to train a detection model which is robust to random perturbation based on combinatorial contrastive learning, while the latter builds an explainer to derive crucial code statements that are most decisive to the detected vulnerability via dual-view causal inference as explanations. We apply Coca over three typical GNN-based vulnerability detectors. Experimental results show that Coca can effectively mitigate the spurious correlation issue, and provide more useful high-quality explanations.

翻译：近期，基于图神经网络（GNN）的漏洞检测系统取得了显著成功。然而，缺乏可解释性对在安全相关领域部署黑盒模型构成了关键挑战。为此，已有多种方法通过提供一组对模型预测有正向贡献的关键语句来解释检测模型的决策逻辑。不幸的是，由于检测模型鲁棒性弱及解释策略欠优，这些方法存在揭示虚假关联和冗余问题的风险。本文提出Coca这一通用框架，旨在：1）增强现有基于GNN的漏洞检测模型的鲁棒性，以避免虚假解释；2）提供既简洁又有效的解释，用于推理检测到的漏洞。Coca由两个核心部分组成，分别称为训练器（Trainer）和解释器（Explainer）。前者旨在基于组合对比学习训练对随机扰动鲁棒的检测模型，后者通过双视角因果推理构建解释器，提取对检测到的漏洞最具决定性的关键代码语句作为解释。我们将Coca应用于三种典型的基于GNN的漏洞检测器。实验结果表明，Coca能够有效缓解虚假关联问题，并提供更高质量的有用解释。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日