Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. This paper delves into the concept of answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. We implement this approach using six different models and thoroughly analyze their performance. Remarkably, our proposed Single-Sentence Readers achieve results that nearly match those of models trained on conventional training sets, proving their effectiveness. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution.
翻译:机器阅读理解(MRC)模型倾向于利用虚假相关性(研究社区中亦称为数据集偏差或标注伪迹)。因此,这些模型可能在不充分理解给定上下文和问题的情况下执行MRC任务,这是不可取的,因为可能导致对分布偏移的鲁棒性较低。本文深入探讨了答案位置偏差的概念,即相当高比例的训练问题的答案仅位于上下文的第一句中。我们提出了一种名为单句阅读器的新方法,用于解决MRC中的答案位置偏差。我们使用六种不同模型实现了这一方法,并对其性能进行了深入分析。值得注意的是,我们提出的单句阅读器取得了几乎与在传统训练集上训练的模型相匹配的结果,证明了其有效性。我们的研究还讨论了单句阅读器面临的若干挑战,并提出了一种潜在的解决方案。