In the rapidly evolving landscape of information retrieval, search engines strive to provide more personalized and relevant results to users. Query suggestion systems play a crucial role in achieving this goal by assisting users in formulating effective queries. However, existing query suggestion systems mainly rely on textual inputs, potentially limiting user search experiences for querying images. In this paper, we introduce a novel Multimodal Query Suggestion (MMQS) task, which aims to generate query suggestions based on user query images to improve the intentionality and diversity of search results. We present the RL4Sugg framework, leveraging the power of Large Language Models (LLMs) with Multi-Agent Reinforcement Learning from Human Feedback to optimize the generation process. Through comprehensive experiments, we validate the effectiveness of RL4Sugg, demonstrating a 18% improvement compared to the best existing approach. Moreover, the MMQS has been transferred into real-world search engine products, which yield enhanced user engagement. Our research advances query suggestion systems and provides a new perspective on multimodal information retrieval.
翻译:在信息检索领域快速发展的背景下,搜索引擎致力于为用户提供更个性化、更相关的结果。查询建议系统通过帮助用户构建有效的查询语句,在实现这一目标中发挥着关键作用。然而,现有查询建议系统主要依赖文本输入,这可能会限制用户对图像查询的搜索体验。本文提出了一项新颖的多模态查询建议(MMQS)任务,旨在基于用户查询图像生成查询建议,以增强搜索结果的意图性和多样性。我们提出了RL4Sugg框架,利用大语言模型(LLMs)的强大能力,结合基于人类反馈的多智能体强化学习来优化生成过程。通过全面的实验,我们验证了RL4Sugg的有效性,证明其相比现有最优方法提升了18%。此外,MMQS已被实际应用于真实搜索引擎产品,显著提升了用户参与度。本研究推进了查询建议系统的发展,并为多模态信息检索提供了新的视角。