Recent developments in large language models (LLMs) have shown promise in their ability to generate synthetic query-document pairs by prompting with as few as 8 demonstrations. This has enabled building better IR models, especially for tasks with no training data readily available. Typically, such synthetic query generation (QGen) approaches condition on an input context (e.g. a text document) and generate a query relevant to that context, or condition the QGen model additionally on the relevance label (e.g. relevant vs irrelevant) to generate queries across relevance buckets. However, we find that such QGen approaches are sub-optimal as they require the model to reason about the desired label and the input from a handful of examples. In this work, we propose to reduce this burden of LLMs by generating queries simultaneously for different labels. We hypothesize that instead of asking the model to generate, say, an irrelevant query given an input context, asking the model to generate an irrelevant query relative to a relevant query is a much simpler task setup for the model to reason about. Extensive experimentation across seven IR datasets shows that synthetic queries generated in such a fashion translates to a better downstream performance, suggesting that the generated queries are indeed of higher quality.
翻译:近期大型语言模型(LLMs)的发展显示,仅需8个演示样本即可生成合成查询-文档对,这为构建更优的信息检索(IR)模型提供了可能,尤其适用于缺乏训练数据的任务。典型的合成查询生成(QGen)方法通常以输入上下文(如文本文档)为条件生成相关查询,或额外以相关性标签(如相关与不相关)为条件,按相关性类别生成查询。然而,我们发现这类QGen方法并非最优,因为它们要求模型仅凭少量示例对期望标签和输入进行推理。在本工作中,我们提出通过同时为不同标签生成查询来减轻LLMs的负担。我们假设:与其要求模型根据给定输入上下文生成不相关查询,不如要求模型基于相关查询生成相对不相关的查询,这对模型而言是更简单的推理任务。在七个IR数据集上的广泛实验表明,以这种方式生成的合成查询能带来更好的下游性能,证明所生成的查询确实具有更高质量。