Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot product function. In this paper, we propose a new representation learning framework for dense retrieval. Instead of learning a vector for each query and document, our framework learns a multivariate distribution and uses negative multivariate KL divergence to compute the similarity between distributions. For simplicity and efficiency reasons, we assume that the distributions are multivariate normals and then train large language models to produce mean and variance vectors for these distributions. We provide a theoretical foundation for the proposed framework and show that it can be seamlessly integrated into the existing approximate nearest neighbor algorithms to perform retrieval efficiently. We conduct an extensive suite of experiments on a wide range of datasets, and demonstrate significant improvements compared to competitive dense retrieval models.
翻译:稠密检索模型采用双编码器网络架构来学习查询和文档的表示。这些表示通常以向量形式呈现,其相似度常通过点积函数计算。本文提出了一种面向稠密检索的新型表示学习框架。不同于为每个查询和文档学习单一向量,该框架为每个查询和文档学习一个多元分布,并使用负多元KL散度来计算分布间的相似度。出于简洁性和效率考虑,我们假设这些分布为多元正态分布,进而训练大型语言模型生成这些分布的均值向量和方差向量。我们为该框架提供了理论基础,并证明其可无缝集成到现有的近似最近邻算法中,以实现高效检索。我们在广泛的数据集上进行了大量实验,结果表明,与具有竞争力的稠密检索模型相比,该框架取得了显著改进。