The lack of interpretability of the Vision Transformer may hinder its use in critical real-world applications despite its effectiveness. To overcome this issue, we propose a post-hoc interpretability method called VISION DIFFMASK, which uses the activations of the model's hidden layers to predict the relevant parts of the input that contribute to its final predictions. Our approach uses a gating mechanism to identify the minimal subset of the original input that preserves the predicted distribution over classes. We demonstrate the faithfulness of our method, by introducing a faithfulness task, and comparing it to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K, achieving compelling results. To aid reproducibility and further extension of our work, we open source our implementation: https://github.com/AngelosNal/Vision-DiffMask
翻译:视觉Transformer缺乏可解释性可能阻碍其在关键实际应用中的部署,尽管其性能表现优异。为克服这一问题,我们提出了一种名为VISION DIFFMASK的事后可解释方法,该方法利用模型隐藏层的激活值来预测输入中对最终预测结果有贡献的相关区域。我们的方法采用门控机制来识别能够保持预测类别分布的最小原始输入子集。通过引入忠实性任务,并在CIFAR-10和ImageNet-1K数据集上与其他先进归因方法进行对比,我们证明了该方法在忠实性方面的可靠表现,取得了令人信服的结果。为促进可重复性和后续研究扩展,我们开源了实现代码:https://github.com/AngelosNal/Vision-DiffMask