We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.
翻译:我们探索了一种大语言模型引导的查询精化范式,旨在扩展嵌入模型在具有挑战性的零样本搜索和分类任务中的可用性。该方法通过利用生成式大语言模型对少量文档的反馈,精化用户查询的嵌入表示,使嵌入能够实时适应目标任务。我们使用最先进的文本嵌入模型,在一系列多样化的具有挑战性的搜索和分类基准上进行了广泛实验。实验结果表明,大语言模型引导的查询精化在所有模型和数据集上均取得了一致的性能提升,在文献搜索、意图检测、关键点匹配和细粒度查询指令遵循等任务中,相对改进幅度高达+25%。精化后的查询提升了排序质量,并在语料库中引发了更清晰的二元分离,使得嵌入空间能够更好地反映每个临时用户查询的细微且任务特定的约束。重要的是,这扩展了嵌入模型可有效部署的实际场景范围,使其在语料库规模下无法大规模使用昂贵的大语言模型流程时成为具有吸引力的替代方案。我们发布了实验代码以促进可重复性,地址为 https://github.com/IBM/task-aware-embedding-refinement。