Query rewriting aims to generate a new query that can complement the original query to improve the information retrieval system. Recent studies on query rewriting, such as query2doc (Q2D), query2expand (Q2E) and querey2cot (Q2C), rely on the internal knowledge of Large Language Models (LLMs) to generate a relevant passage to add information to the query. Nevertheless, the efficacy of these methodologies may markedly decline in instances where the requisite knowledge is not encapsulated within the model's intrinsic parameters. In this paper, we propose a novel structured query rewriting method called Crafting the Path tailored for retrieval systems. Crafting the Path involves a three-step process that crafts query-related information necessary for finding the passages to be searched in each step. Specifically, the Crafting the Path begins with Query Concept Comprehension, proceeds to Query Type Identification, and finally conducts Expected Answer Extraction. Experimental results show that our method outperforms previous rewriting methods, especially in less familiar domains for LLMs. We demonstrate that our method is less dependent on the internal parameter knowledge of the model and generates queries with fewer factual inaccuracies. Furthermore, we observe that Crafting the Path has less latency compared to the baselines.
翻译:查询重写旨在生成能够补充原始查询的新查询,以提升信息检索系统的性能。近期的查询重写研究,如query2doc(Q2D)、query2expand(Q2E)和query2cot(Q2C),依赖于大型语言模型(LLMs)的内部知识来生成相关段落,从而为查询增添信息。然而,当所需知识未被封装在模型内部参数中时,这些方法的效能可能会显著下降。本文提出了一种新颖的结构化查询重写方法,称为“精铸路径”,专为检索系统设计。精铸路径包含一个三步流程,逐步构建查询所需的相关信息,以确定每一步中待检索的段落。具体而言,精铸路径始于查询概念理解,继之以查询类型识别,最后进行预期答案提取。实验结果表明,我们的方法优于先前的重写方法,尤其是在LLMs较不熟悉的领域。我们证明,该方法对模型内部参数知识的依赖性较低,且生成的查询事实错误更少。此外,我们观察到精铸路径相较于基线方法具有更低的延迟。