In this paper, we study how open-source large language models (LLMs) can be effectively deployed for improving query rewriting in conversational search, especially for ambiguous queries. We introduce CHIQ, a two-step method that leverages the capabilities of LLMs to resolve ambiguities in the conversation history before query rewriting. This approach contrasts with prior studies that predominantly use closed-source LLMs to directly generate search queries from conversation history. We demonstrate on five well-established benchmarks that CHIQ leads to state-of-the-art results across most settings, showing highly competitive performances with systems leveraging closed-source LLMs. Our study provides a first step towards leveraging open-source LLMs in conversational search, as a competitive alternative to the prevailing reliance on commercial LLMs. Data, models, and source code will be publicly available upon acceptance at https://github.com/fengranMark/CHIQ.
翻译:本文研究如何有效利用开源大语言模型(LLM)改进对话搜索中的查询改写,特别是在处理模糊查询时。我们提出了CHIQ——一种两步式方法,该方法利用LLM的能力在查询改写前先解析对话历史中的歧义。这一思路与先前研究形成对比,先前研究主要依赖闭源LLM直接从对话历史生成搜索查询。我们在五个权威基准测试上证明,CHIQ在多数设定下取得了最先进的性能,其表现与依赖闭源LLM的系统具有高度竞争力。本研究为在对话搜索中利用开源LLM提供了初步探索,为当前普遍依赖商业LLM的现状提供了一种具有竞争力的替代方案。数据、模型及源代码将在论文录用后于 https://github.com/fengranMark/CHIQ 公开。