Re-ranking plays a crucial role in modern information search systems by refining the ranking of initial search results to better satisfy user information needs. However, existing methods show two notable limitations in improving user search satisfaction: inadequate modeling of multifaceted user intents and neglect of rich side information such as visual perception signals. To address these challenges, we propose the Rich-Media Re-Ranker framework, which aims to enhance user search satisfaction through multi-dimensional and fine-grained modeling. Our approach begins with a Query Planner that analyzes the sequence of query refinements within a session to capture genuine search intents, decomposing the query into clear and complementary sub-queries to enable broader coverage of users' potential intents. Subsequently, moving beyond primary text content, we integrate richer side information of candidate results, including signals modeling visual content generated by the VLM-based evaluator. These comprehensive signals are then processed alongside carefully designed re-ranking principle that considers multiple facets, including content relevance and quality, information gain, information novelty, and the visual presentation of cover images. Then, the LLM-based re-ranker performs the holistic evaluation based on these principles and integrated signals. To enhance the scenario adaptability of the VLM-based evaluator and the LLM-based re-ranker, we further enhance their capabilities through multi-task reinforcement learning. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines. Notably, the proposed framework has been deployed in a large-scale industrial search system, yielding substantial improvements in online user engagement rates and satisfaction metrics.
翻译:重排序在现代信息搜索系统中扮演关键角色,通过优化初始搜索结果排序以更好地满足用户信息需求。然而,现有方法在提升用户搜索满意度方面存在两大显著局限:对用户多维意图建模不足,以及忽略视觉感知信号等富侧信息。为解决这些挑战,我们提出富媒体重排序框架,旨在通过多维度和细粒度建模提升用户搜索满意度。该方法以查询规划器为起点,分析会话中查询优化的序列以捕捉真实搜索意图,将查询分解为清晰互补的子查询,从而更广泛覆盖用户的潜在意图。随后,超越主要文本内容,我们整合候选结果的更丰富侧信息,包括由基于VLM的评估器生成的视觉内容建模信号。这些综合信号与精心设计的重排序原则(涵盖内容相关性与质量、信息增益、信息新颖性及封面图像视觉呈现等多维度)共同处理。基于LLM的重排序器根据这些原则与集成信号执行整体评估。为增强VLM评估器和LLM重排序器的场景适应性,我们进一步通过多任务强化学习提升其能力。大量实验表明,我们的方法显著优于现有最优基线。值得注意的是,该框架已在大型工业搜索系统中部署,在线用户参与率和满意度指标均获得显著提升。