We present Learning to Explain (LTX), a model-agnostic framework designed for providing post-hoc explanations for vision models. The LTX framework introduces an "explainer" model that generates explanation maps, highlighting the crucial regions that justify the predictions made by the model being explained. To train the explainer, we employ a two-stage process consisting of initial pretraining followed by per-instance finetuning. During both stages of training, we utilize a unique configuration where we compare the explained model's prediction for a masked input with its original prediction for the unmasked input. This approach enables the use of a novel counterfactual objective, which aims to anticipate the model's output using masked versions of the input image. Importantly, the LTX framework is not restricted to a specific model architecture and can provide explanations for both Transformer-based and convolutional models. Through our evaluations, we demonstrate that LTX significantly outperforms the current state-of-the-art in explainability across various metrics.
翻译:我们提出了学习解释(LTX),这是一种专为视觉模型提供事后解释的模型无关框架。LTX框架引入了一个“解释器”模型,该模型生成解释图,突出显示对解释模型做出预测至关重要的区域。为了训练解释器,我们采用了两阶段过程:初始预训练,随后进行逐实例微调。在两个训练阶段中,我们利用一种独特的配置,将解释模型对掩码输入的预测与其对未掩码输入的原始预测进行比较。这种方法使得能够使用一种新颖的反事实目标函数,该函数旨在通过输入图像的掩码版本来预测模型的输出。重要的是,LTX框架不局限于特定的模型架构,可以为基于Transformer和卷积的模型提供解释。通过我们的评估,我们证明LTX在各种指标上显著超越了当前最先进的可解释性方法。