Deep learning has been widely used in source code classification tasks, such as code classification according to their functionalities, code authorship attribution, and vulnerability detection. Unfortunately, the black-box nature of deep learning makes it hard to interpret and understand why a classifier (i.e., classification model) makes a particular prediction on a given example. This lack of interpretability (or explainability) might have hindered their adoption by practitioners because it is not clear when they should or should not trust a classifier's prediction. The lack of interpretability has motivated a number of studies in recent years. However, existing methods are neither robust nor able to cope with out-of-distribution examples. In this paper, we propose a novel method to produce \underline{Rob}ust \underline{in}terpreters for a given deep learning-based code classifier; the method is dubbed Robin. The key idea behind Robin is a novel hybrid structure combining an interpreter and two approximators, while leveraging the ideas of adversarial training and data augmentation. Experimental results show that on average the interpreter produced by Robin achieves a 6.11\% higher fidelity (evaluated on the classifier), 67.22\% higher fidelity (evaluated on the approximator), and 15.87x higher robustness than that of the three existing interpreters we evaluated. Moreover, the interpreter is 47.31\% less affected by out-of-distribution examples than that of LEMNA.
翻译:深度学习已广泛应用于源代码分类任务,例如根据功能对代码分类、代码作者归属识别及漏洞检测。然而,深度学习的黑盒特性使其难以解释和理解分类器(即分类模型)为何对特定示例做出特定预测。这种可解释性(或可说明性)的缺失可能阻碍了从业者对其的采用,因为尚不清楚何时应信任或不应信任分类器的预测。可解释性的缺乏近年来激发了大量研究,但现有方法既不鲁棒,也无法应对分布外示例。本文提出一种新方法,为给定的基于深度学习的代码分类器生成\underline{鲁棒}可\underline{解释}器,该方法称为Robin。Robin的核心思想是一种结合解释器与两个近似器的新型混合结构,同时利用了对抗训练和数据增强的思想。实验结果表明,与三种现有解释器相比,Robin生成的解释器在分类器上的保真度平均提升6.11%,在近似器上的保真度平均提升67.22%,鲁棒性平均提升15.87倍。此外,该解释器受分布外示例的影响比LEMNA低47.31%。