Certifiably Robust Interpretation via Renyi Differential Privacy

Motivated by the recent discovery that the interpretation maps of CNNs could easily be manipulated by adversarial attacks against network interpretability, we study the problem of interpretation robustness from a new perspective of \Renyi differential privacy (RDP). The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds. First, it can offer provable and certifiable top-$k$ robustness. That is, the top-$k$ important attributions of the interpretation map are provably robust under any input perturbation with bounded $\ell_d$-norm (for any $d\geq 1$, including $d = \infty$). Second, our proposed method offers $\sim10\%$ better experimental robustness than existing approaches in terms of the top-$k$ attributions. Remarkably, the accuracy of Renyi-Robust-Smooth also outperforms existing approaches. Third, our method can provide a smooth tradeoff between robustness and computational efficiency. Experimentally, its top-$k$ attributions are {\em twice} more robust than existing approaches when the computational resources are highly constrained.

翻译：由于最近发现CNN的口译地图很容易被对抗性攻击网络可解释性的攻击所操纵,我们从新角度研究了解释的稳健性问题,我们的Renyi-Robust-Smooth(基于RDP的口译方法)有三重优势。首先,它可以提供可证实和可验证的美元顶值的稳健性。这就是说,在任何以美元兑网络解释性的对抗性攻击下,口译地图的顶值-美元的重要属性在任何投入中都相当稳健(对于任何以美元兑1美元兑1美元,包括美元兑每美元兑每美元兑1美元兑每美元兑1美元)的稳健性。第二,我们拟议的方法比目前以美元兑1美元兑1美元计算最高值的方法的试验稳健性更好。值得注意的是,Renyi-Robett-Smooth的准确性也比现有的方法要强得多。第三,我们的方法可以提供稳健和计算效率之间的平稳交易。实验性,当现有方法比高额的美元对高额计算时,其高额资源是两倍。