While deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features, call into question how reliably these classifiers work in the wild. Furthermore, for safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. In this paper, we address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation. We analyze the decisions of image classifiers by visual counterfactual explanations (VCEs), detection of systematic mistakes by analyzing images where classifiers maximally disagree, and visualization of neurons and spurious features. In this way, we validate existing observations, e.g. the shape bias of adversarially robust models, as well as novel failure modes, e.g. systematic errors of zero-shot CLIP classifiers. Moreover, our VCEs outperform previous work while being more versatile.
翻译:尽管深度学习在ImageNet等复杂图像分类任务中取得了巨大进展,但由虚假特征等引起的意外失效模式,使得这些分类器在真实场景中的可靠性受到质疑。此外,对于安全关键任务,其决策的黑箱特性尤为棘手,亟需能够提供解释或至少使决策合理化的方法。本文通过利用引导图像生成框架,优化生成符合分类器目标函数的图像,以应对上述问题。我们通过视觉反事实解释(VCEs)、分析分类器最大分歧图像以检测系统性错误,以及对神经元与虚假特征的可视化,深入剖析图像分类器的决策机制。基于此,我们验证了现有研究结论(例如对抗鲁棒模型的形状偏好),并揭示了新的失效模式(例如零样本CLIP分类器的系统性误差)。此外,本文提出的VCEs在保持更高通用性的同时,性能优于先前工作。