GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking

Retrieval-enhanced text generation has shown remarkable progress on knowledge-intensive language tasks, such as open-domain question answering and knowledge-enhanced dialogue generation, by leveraging passages retrieved from a large passage corpus for delivering a proper answer given the input query. However, the retrieved passages are not ideal for guiding answer generation because of the discrepancy between retrieval and generation, i.e., the candidate passages are all treated equally during the retrieval procedure without considering their potential to generate a proper answer. This discrepancy makes a passage retriever deliver a sub-optimal collection of candidate passages to generate the answer. In this paper, we propose the GeneRative Knowledge Improved Passage Ranking (GripRank) approach, addressing the above challenge by distilling knowledge from a generative passage estimator (GPE) to a passage ranker, where the GPE is a generative language model used to measure how likely the candidate passages can generate the proper answer. We realize the distillation procedure by teaching the passage ranker learning to rank the passages ordered by the GPE. Furthermore, we improve the distillation quality by devising a curriculum knowledge distillation mechanism, which allows the knowledge provided by the GPE can be progressively distilled to the ranker through an easy-to-hard curriculum, enabling the passage ranker to correctly recognize the provenance of the answer from many plausible candidates. We conduct extensive experiments on four datasets across three knowledge-intensive language tasks. Experimental results show advantages over the state-of-the-art methods for both passage ranking and answer generation on the KILT benchmark.

翻译：检索增强的文本生成通过利用从大规模段落语料库中检索到的段落，为输入查询提供恰当答案，在知识密集型语言任务（如开放域问答和知识增强对话生成）上取得了显著进展。然而，由于检索与生成之间存在差异，即候选段落在大规模检索过程中被同等对待，而未考虑其生成恰当答案的潜力，因此检索到的段落并不理想地用于指导答案生成。这种差异导致段落检索器为生成答案提供次优的候选段落集合。本文提出了生成式知识改进段落排序（GripRank）方法，通过将生成式段落估计器（GPE）的知识迁移到段落排序器中来应对上述挑战，其中GPE是一种用于衡量候选段落生成恰当答案可能性的生成式语言模型。我们通过教导段落排序器学习根据GPE排序的段落顺序来实现知识迁移过程。此外，我们设计了一种课程知识蒸馏机制来提升蒸馏质量，该机制使得GPE提供的知识能够通过由易到难的课程逐步蒸馏到排序器中，从而使段落排序器能够从众多看似合理的候选项中正确识别答案的出处。我们在三个知识密集型语言任务的四个数据集上进行了广泛实验。实验结果表明，该方法在KILT基准测试的段落排序和答案生成任务上均优于现有最先进方法。