The rise of deep learning in image classification has brought unprecedented accuracy but also highlighted a key issue: the use of 'shortcuts' by models. Such shortcuts are easy-to-learn patterns from the training data that fail to generalise to new data. Examples include the use of a copyright watermark to recognise horses, snowy background to recognise huskies, or ink markings to detect malignant skin lesions. The explainable AI (XAI) community has suggested using instance-level explanations to detect shortcuts without external data, but this requires the examination of many explanations to confirm the presence of such shortcuts, making it a labour-intensive process. To address these challenges, we introduce Counterfactual Frequency (CoF) tables, a novel approach that aggregates instance-based explanations into global insights, and exposes shortcuts. The aggregation implies the need for some semantic concepts to be used in the explanations, which we solve by labelling the segments of an image. We demonstrate the utility of CoF tables across several datasets, revealing the shortcuts learned from them.
翻译:深度学习在图像分类领域的兴起带来了前所未有的准确率,但也突显出一个关键问题:模型对“捷径”的使用。此类捷径是从训练数据中学到的易于掌握的模式,却无法泛化到新数据。例如,利用版权水印识别马匹、通过雪地背景识别哈士奇犬,或依据墨水标记检测恶性皮肤病变。可解释人工智能(XAI)领域建议使用实例级解释来检测捷径而无需外部数据,但这需要检查大量解释以确认此类捷径的存在,导致该过程劳动密集。为应对这些挑战,我们引入了反事实频率(CoF)表——一种将基于实例的解释聚合为全局洞察从而揭示捷径的新方法。这种聚合意味着需要在解释中使用某些语义概念,我们通过标注图像片段来解决这一问题。我们在多个数据集上验证了CoF表的实用性,揭示了从这些数据集中学到的捷径。