Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires intricate heuristics, including selecting hard negatives and using additional supervision as learning signals. This reliance on heuristics stems from the fact that the contrastive loss itself is heuristic and does not directly optimize the downstream metrics of decision quality at the end of the processing pipeline. To address this issue, we introduce Neural PG-RANK, a novel training algorithm that learns to rank by instantiating a LLM as a Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for end-to-end training of retrieval models as part of larger decision systems via policy gradient, with little reliance on complex heuristics, and it effectively unifies the training objective with downstream decision-making quality. We conduct extensive experiments on various text retrieval benchmarks. The results demonstrate that when the training objective aligns with the evaluation setup, Neural PG-RANK yields remarkable in-domain performance improvement, with substantial out-of-domain generalization to some critical datasets employed in downstream question answering tasks.
翻译:文本检索在将事实知识融入语言处理管道(从基于聊天的网络搜索到问答系统)的决策过程中起着关键作用。当前最先进的文本检索模型利用预训练大型语言模型(LLM)取得竞争性性能,但通过典型对比损失训练基于LLM的检索器需要复杂的启发式策略,包括选择困难负样本以及将额外监督信号作为学习信号。这种对启发式策略的依赖源于对比损失本身是启发式的,无法直接优化处理管道末端决策质量的评估指标。为解决该问题,我们提出Neural PG-RANK——一种新型训练算法,通过将LLM实例化为Plackett-Luce排序策略来学习排序。Neural PG-RANK提供了一种基于策略梯度的端到端检索模型训练原则性方法,可作为更大决策系统的组成部分,几乎无需依赖复杂启发式策略,并有效统一了训练目标与下游决策质量。我们在多种文本检索基准上开展广泛实验。结果表明,当训练目标与评估设置对齐时,Neural PG-RANK在域内性能上取得显著提升,并在下游问答任务使用的关键数据集上展现出强大的跨域泛化能力。