The RemOve-And-Retrain (ROAR) benchmark is widely used to evaluate feature attribution methods, yet its validity remains underexplored from an information-theoretic perspective. We show that model- and data-agnostic post-processing of attribution maps (transformations that, by the data processing inequality, \emph{cannot} add information about the decision function) can often improve ROAR scores. This means that an improved ROAR ranking is not, by itself, evidence that an attribution map carries more information about the model. We trace this failure mode to a bias toward spatially blurry masks. Experiments on CIFAR-10, SVHN, and CUB-200 show a consistent association between blurriness and ROAR performance, a pattern that also appears in the ROAD variant. We provide guidelines for more cautious removal-based benchmarking, with implications for validating mechanistic understanding of neural network internals.
翻译:广泛使用的RemOve-And-Retrain(ROAR)基准测试用于评估特征归因方法,但其有效性在信息论视角下仍未得到充分探讨。我们证明,对归因图进行与模型和数据无关的后处理(根据数据处理不等式,这类变换*不能*增加关于决策函数的信息)往往能提升ROAR分数。这意味着,ROAR排名的改善本身并不能证明归因图携带了更多关于模型的信息。我们将这种失效模式追溯到对空间模糊掩膜的偏好。在CIFAR-10、SVHN和CUB-200上的实验表明,模糊性与ROAR性能之间存在一致关联,这一模式在ROAD变体中也同样出现。我们为更谨慎的基于移除的基准测试提供指导原则,这对验证神经网络内部机制的因果理解具有启示意义。