We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds on the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a target question, CBR-MRC retrieves a set of similar questions from a memory of observed cases and predicts an answer by selecting the span in the target context that is most similar to the contextualized representations of answers in the retrieved cases. The semi-parametric nature of our approach allows CBR-MRC to attribute a prediction to the specific set of cases used during inference, making it a desirable choice for building reliable and debuggable QA systems. We show that CBR-MRC achieves high test accuracy comparable with large reader models, outperforming baselines by 11.5 and 8.4 EM on NaturalQuestions and NewsQA, respectively. Further, we also demonstrate the ability of CBR-MRC in identifying not just the correct answer tokens but also the span with the most relevant supporting evidence. Lastly, we observe that contexts for certain question types show higher lexical diversity than others and find CBR-MRC to be robust to these variations while performance using fully-parametric methods drops.
翻译:我们提出一种精确且可解释的机器阅读理解答案抽取方法,该方法借鉴了经典人工智能中的案例推理(CBR)。我们的方法(CBR-MRC)基于以下假设:对相似问题的上下文化答案在语义上具有相似性。给定目标问题后,CBR-MRC从观测案例的记忆中检索一组相似问题,并通过在目标上下文中选择与检索案例中答案的上下文化表示最相似的文本片段来预测答案。该方法的半参数特性使得CBR-MRC能够将预测归因于推理过程中使用的特定案例集,从而成为构建可靠且可调试问答系统的理想选择。我们证明,CBR-MRC在测试准确率上与大型阅读器模型相当,在NaturalQuestions和NewsQA数据集上分别以11.5和8.4的精确匹配(EM)分数优势超越基线模型。此外,我们还展示了CBR-MRC不仅能识别正确答案标记,还能定位包含最相关支持证据的文本片段。最后,我们观察到某些类型问题的上下文具有比其他问题更高的词汇多样性,并发现CBR-MRC对这些变化具有鲁棒性,而全参数方法的性能则会下降。