In image classification scenarios where both prediction and explanation efficiency are required, self-explaining models that perform both tasks in a single inference are effective. However, for users who already have prediction-only models, training a new self-explaining model from scratch imposes significant costs in terms of both labeling and computation. This study proposes a method to transfer the visual explanation capability of self-explaining models learned in a source domain to prediction-only models in a target domain based on a task arithmetic framework. Our self-explaining model comprises an architecture that extends Vision Transformer-based prediction-only models, enabling the proposed method to endow explanation capability to many trained prediction-only models without additional training. Experiments on various image classification datasets demonstrate that, except for transfers between less-related domains, the transfer of visual explanation capability from source to target domains is successful, and explanation quality in the target domain improves without substantially sacrificing classification accuracy.
翻译:在同时要求预测效率和解释效率的图像分类场景中,能够在单次推理中同时完成两项任务的自解释模型具有显著优势。然而,对于已部署仅预测模型的用户而言,从头训练新的自解释模型将带来标注和计算方面的双重成本。本研究基于任务算术框架,提出一种将源领域已学习的自解释模型的可视化解释能力迁移至目标领域仅预测模型的方法。我们的自解释模型采用基于Vision Transformer的仅预测模型扩展架构,使得该方法能够为大量已训练的仅预测模型赋予解释能力而无需额外训练。在多类图像分类数据集上的实验表明,除关联性较弱领域间的迁移外,从源领域到目标领域的可视化解释能力迁移均取得成功,且目标领域的解释质量在未显著牺牲分类精度的前提下获得提升。