Re-ranking plays a crucial role in modern information search systems by refining the ranking of initial search results to better satisfy user information needs. However, existing methods show two notable limitations in improving user search satisfaction: inadequate modeling of multifaceted user intents and neglect of rich side information such as visual perception signals. To address these challenges, we propose the Rich-Media Re-Ranker framework, which aims to enhance user search satisfaction through multi-dimensional and fine-grained modeling. Our approach begins with a Query Planner that analyzes the sequence of query refinements within a session to capture genuine search intents, decomposing the query into clear and complementary sub-queries to enable broader coverage of users' potential intents. Subsequently, moving beyond primary text content, we integrate richer side information of candidate results, including signals modeling visual content generated by the VLM-based evaluator. These comprehensive signals are then processed alongside carefully designed re-ranking principle that considers multiple facets, including content relevance and quality, information gain, information novelty, and the visual presentation of cover images. Then, the LLM-based re-ranker performs the holistic evaluation based on these principles and integrated signals. To enhance the scenario adaptability of the VLM-based evaluator and the LLM-based re-ranker, we further enhance their capabilities through multi-task reinforcement learning. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines. Notably, the proposed framework has been deployed in a large-scale industrial search system, yielding substantial improvements in online user engagement rates and satisfaction metrics.
翻译:重排序在现代信息检索系统中扮演着关键角色,它通过优化初始搜索结果的排序以更好地满足用户信息需求。然而,现有方法在提升用户搜索满意度方面存在两个显著局限:对多维度用户意图建模不足,以及忽视了丰富的辅助信息(如视觉感知信号)。为应对这些挑战,我们提出了富媒体重排序器框架,旨在通过多维度细粒度建模提升用户搜索满意度。我们的方法首先通过查询规划器分析会话内的查询优化序列,以捕捉真实搜索意图,并将查询分解为清晰且互补的子查询,从而更广泛地覆盖用户的潜在意图。随后,超越主要文本内容,我们整合了候选结果更丰富的辅助信息,包括由基于VLM的评估器生成的视觉内容建模信号。这些综合信号随后与精心设计的重排序原则共同处理,该原则考虑了多个方面,包括内容相关性与质量、信息增益、信息新颖性以及封面图像的视觉呈现。接着,基于LLM的重排序器依据这些原则和整合信号进行整体评估。为增强基于VLM的评估器和基于LLM的重排序器的场景适应性,我们通过多任务强化学习进一步提升了它们的能力。大量实验表明,我们的方法显著优于现有最先进的基线模型。值得注意的是,该框架已部署于大规模工业搜索系统中,在线用户参与率和满意度指标均获得显著提升。