We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a test question, CBR-MRC first retrieves a set of similar cases from a non-parametric memory and then predicts an answer by selecting the span in the test context that is most similar to the contextualized representations of answers in the retrieved cases. The semi-parametric nature of our approach allows it to attribute a prediction to the specific set of evidence cases, making it a desirable choice for building reliable and debuggable QA systems. We show that CBR-MRC provides high accuracy comparable with large reader models and outperforms baselines by 11.5 and 8.4 EM on NaturalQuestions and NewsQA, respectively. Further, we demonstrate the ability of CBR-MRC in identifying not just the correct answer tokens but also the span with the most relevant supporting evidence. Lastly, we observe that contexts for certain question types show higher lexical diversity than others and find that CBR-MRC is robust to these variations while performance using fully-parametric methods drops.
翻译:我们提出一种准确且可解释的机器阅读理解答案抽取方法,该方法借鉴了经典人工智能中的案例推理(CBR)思想。我们的方法(CBR-MRC)基于如下假设:相似问题的上下文答案在语义上具有相似性。面对测试问题时,CBR-MRC首先从非参数化记忆库中检索一组相似案例,然后通过选取测试上下文中与检索案例中答案的上下文表示最相似的文本片段进行答案预测。这种半参数化特性使模型能够将预测结果归因于特定的证据案例集,从而成为构建可靠且可调试问答系统的理想选择。实验表明,CBR-MRC在NaturalQuestions和NewsQA数据集上分别取得与大型阅读器模型相当的高精度,并以EM指标分别超越基线模型11.5和8.4个百分点。进一步研究发现,CBR-MRC不仅能识别正确的答案词元,还能定位最具相关支持证据的文本片段。最后,我们观察到某些问题类型的上下文存在更高的词汇多样性,而CBR-MRC对此类变化具有鲁棒性,全参数化方法的性能则会因此下降。