We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.
翻译:我们提出一种简单有效的重排序方法,用于改进开放域问答中的段落检索。该重排序器利用零样本问题生成模型对检索到的段落进行重新评分,该模型使用预训练语言模型计算基于检索段落的输入问题概率。该方法可应用于任何检索方法(如基于神经或关键词的检索)之上,无需任何领域或任务特定训练(因此预期能更好地泛化到数据分布变化),并提供查询与段落之间的丰富交叉注意力(即必须解释问题中的每个词元)。在多个开放域检索数据集上的评估显示,我们的重排序器在top-20段落检索准确率上,将强无监督检索模型提升了6%-18%的绝对值,强监督模型提升了高达12%。同时,通过简单地将新重排序器集成至现有模型而无需其他修改,我们在完整开放域问答任务上取得了新的最优结果。