Shortcuts, also described as Clever Hans behavior, spurious correlations, or confounders, present a significant challenge in machine learning and AI, critically affecting model generalization and robustness. Research in this area, however, remains fragmented across various terminologies, hindering the progress of the field as a whole. Consequently, we introduce a unifying taxonomy of shortcut learning by providing a formal definition of shortcuts and bridging the diverse terms used in the literature. In doing so, we further establish important connections between shortcuts and related fields, including bias, causality, and security, where parallels exist but are rarely discussed. Our taxonomy organizes existing approaches for shortcut detection and mitigation, providing a comprehensive overview of the current state of the field and revealing underexplored areas and open challenges. Moreover, we compile and classify datasets tailored to study shortcut learning. Altogether, this work provides a holistic perspective to deepen understanding and drive the development of more effective strategies for addressing shortcuts in machine learning.
翻译:捷径学习(亦称“聪明汉斯”行为、伪相关或混杂因素)是机器学习与人工智能领域面临的重大挑战,严重影响模型的泛化能力与鲁棒性。然而,该领域研究长期受术语体系碎片化困扰,阻碍了整体进展。为此,本文通过形式化定义“捷径”概念并梳理文献中的多元术语,提出统一的捷径学习分类体系。在此基础上,我们进一步建立捷径与偏差、因果推断、安全等关联领域的重要联系——这些领域虽存在共性却鲜少被系统讨论。本分类体系系统整合了现有的捷径检测与缓解方法,全面呈现领域研究现状,同时揭示尚未充分探索的研究方向与开放挑战。此外,我们专门汇编并分类了适用于捷径学习研究的数据集。整体而言,本研究为深化理论认知、推动开发更有效的机器学习捷径应对策略提供了全局性视角。