Dense passage retrieval (DPR) is the first step in the retrieval augmented generation (RAG) paradigm for improving the performance of large language models (LLM). DPR fine-tunes pre-trained networks to enhance the alignment of the embeddings between queries and relevant textual data. A deeper understanding of DPR fine-tuning will be required to fundamentally unlock the full potential of this approach. In this work, we explore DPR-trained models mechanistically by using a combination of probing, layer activation analysis, and model editing. Our experiments show that DPR training decentralizes how knowledge is stored in the network, creating multiple access pathways to the same information. We also uncover a limitation in this training style: the internal knowledge of the pre-trained model bounds what the retrieval model can retrieve. These findings suggest a few possible directions for dense retrieval: (1) expose the DPR training process to more knowledge so more can be decentralized, (2) inject facts as decentralized representations, (3) model and incorporate knowledge uncertainty in the retrieval process, and (4) directly map internal model knowledge to a knowledge base.
翻译:密集段落检索(DPR)是检索增强生成(RAG)范式中提升大语言模型(LLM)性能的首要步骤。DPR通过对预训练网络进行微调,增强查询与相关文本数据在嵌入空间中的对齐程度。要根本性释放该方法的全部潜力,需更深入地理解DPR微调机制。本研究通过结合探测分析、层激活分析及模型编辑方法,从机制层面探究DPR训练模型。实验表明,DPR训练使知识在网络中的存储方式呈现去中心化特征,为相同信息创建了多条访问路径。同时我们发现该训练范式存在局限:预训练模型的内部知识会限制检索模型所能检索的内容范围。这些发现为密集检索技术指明了若干潜在方向:(1)使DPR训练过程接触更多知识以扩大去中心化范围;(2)将事实信息以去中心化表征形式注入模型;(3)在检索过程中建模并融入知识不确定性;(4)建立模型内部知识与知识库的直接映射关系。