In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage re-ranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other's relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.
翻译:在各类自然语言处理任务中,段落检索与段落重排序是查找和排序相关信息的两个关键流程。由于这两个流程共同影响最终性能,因此对其进行联合优化以实现相互改进具有重要意义。本文提出一种面向稠密段落检索与段落重排序的新型联合训练方法。主要贡献在于引入了动态列表式蒸馏技术,其中我们为检索器和重排序器设计了统一的列表式训练方法。在动态蒸馏过程中,检索器和重排序器可根据彼此的相关性信息实现自适应改进。我们还提出一种混合数据增强策略,用于为列表式训练方法构建多样化训练实例。大量实验表明,该方法在MSMARCO和Natural Questions数据集上均具有有效性。相关代码已开源至https://github.com/PaddlePaddle/RocketQA。