The performance of Dense retrieval (DR) is significantly influenced by the quality of negative sampling. Traditional DR methods primarily depend on naive negative sampling techniques or on mining hard negatives through external retriever and meticulously crafted strategies. However, naive negative sampling often fails to adequately capture the accurate boundaries between positive and negative samples, whereas existing hard negative sampling methods are prone to false negatives, resulting in performance degradation and training instability. Recent advancements in large language models (LLMs) offer an innovative solution to these challenges by generating contextually rich and diverse negative samples. In this work, we present a framework that harnesses LLMs to synthesize high-quality hard negative samples. We first devise a \textit{multi-attribute self-reflection prompting strategy} to direct LLMs in hard negative sample generation. Then, we implement a \textit{hybrid sampling strategy} that integrates these synthetic negatives with traditionally retrieved negatives, thereby stabilizing the training process and improving retrieval performance. Extensive experiments on five benchmark datasets demonstrate the efficacy of our approach, and code is also publicly available.
翻译:稠密检索的性能显著受负采样质量的影响。传统的稠密检索方法主要依赖于朴素的负采样技术,或通过外部检索器和精心设计的策略挖掘困难负样本。然而,朴素负采样往往无法充分捕捉正负样本间的准确边界,而现有的困难负采样方法容易产生假负例,导致性能下降和训练不稳定。大语言模型的最新进展为这些挑战提供了创新解决方案,能够生成上下文丰富且多样化的负样本。在本工作中,我们提出了一个利用大语言模型合成高质量困难负样本的框架。我们首先设计了一种\textit{多属性自反思提示策略}来引导大语言模型生成困难负样本。随后,我们实施了一种\textit{混合采样策略},将这些合成负样本与传统检索得到的负样本相结合,从而稳定训练过程并提升检索性能。在五个基准数据集上的大量实验证明了我们方法的有效性,相关代码也已公开。