The issue of shortcut learning is widely known in NLP and has been an important research focus in recent years. Unintended correlations in the data enable models to easily solve tasks that were meant to exhibit advanced language understanding and reasoning capabilities. In this survey paper, we focus on the field of machine reading comprehension (MRC), an important task for showcasing high-level language understanding that also suffers from a range of shortcuts. We summarize the available techniques for measuring and mitigating shortcuts and conclude with suggestions for further progress in shortcut research. Importantly, we highlight two concerns for shortcut mitigation in MRC: (1) the lack of public challenge sets, a necessary component for effective and reusable evaluation, and (2) the lack of certain mitigation techniques that are prominent in other areas.
翻译:捷径学习问题在自然语言处理领域广为人知,并已成为近年来的重要研究焦点。数据中存在的非预期相关性使模型能够轻易解决本应展现高级语言理解与推理能力的任务。本综述聚焦机器阅读理解这一重要领域——该任务旨在展示高水平语言理解能力,却同样面临多种捷径问题的困扰。我们系统梳理了当前用于测量和缓解捷径的技术方法,并针对该领域的未来研究方向提出建议。特别值得关注的是,机器阅读理解中捷径缓解工作面临两大挑战:(1)缺乏公开的挑战测试集——这是实现有效且可复用评估的必要条件;(2)缺乏在其他领域已广泛应用的特定缓解技术。