Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. This paper delves into the concept of answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. We implement this approach using six different models and thoroughly analyze their performance. Remarkably, our proposed Single-Sentence Readers achieve results that nearly match those of models trained on conventional training sets, proving their effectiveness. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution.
翻译:机器阅读理解(MRC)模型倾向于利用虚假关联(研究社区中亦称为数据集偏差或标注伪迹)。因此,这些模型可能在未充分理解给定上下文和问题的情况下执行MRC任务,这不足取,因为会导致其对分布迁移的鲁棒性较低。本文深入探讨了答案位置偏差的概念——即训练问题中答案仅出现在上下文首句的比例显著偏高。我们提出单句阅读器作为应对MRC中答案位置偏差的新方法,并基于六种不同模型实现该方法,对其性能进行了全面分析。值得注意的是,我们提出的单句阅读器取得了几乎与在传统训练集上训练的模型相媲美的结果,验证了其有效性。本研究还讨论了单句阅读器面临的若干挑战,并提出了潜在的解决方案。