Does the stethoscope in the picture make the adjacent person a doctor or a patient? This, of course, depends on the contextual relationship of the two objects. If it's obvious, why don't explanation methods for vision models use contextual information? In this paper, we (1) review the most popular methods of explaining computer vision models by pointing out that they do not take into account context information, (2) show examples of failures of popular XAI methods, (3) provide examples of real-world use cases where spatial context plays a significant role, (4) propose new research directions that may lead to better use of context information in explaining computer vision models, (5) argue that a change in approach to explanations is needed from 'where' to 'how'.
翻译:图片中的听诊器是否使得旁边的人是医生还是病人?这当然取决于两个物体的上下文关系。如果这一点显而易见,为什么视觉模型的解释方法不使用上下文信息?在本文中,我们(1)通过指出当前最流行的计算机视觉模型解释方法均未考虑上下文信息,对这些方法进行了系统性评述;(2)展示了流行XAI方法失效的典型案例;(3)提供了空间上下文起关键作用的实际应用场景示例;(4)提出了可能促进上下文信息在计算机视觉模型解释中更好利用的新研究方向;(5)主张解释方法的范式需要从“何处”转向“如何”。