Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge provides a powerful mechanism to assess content quality, agnostic to sparse user interactions. To this end, we propose a LLM-driven multimodal reranking framework, which estimates user experience without real user behavior. The approach involves a two-stage training process: the first stage uses multimodal evidence to construct high-quality annotations for supervised fine-tuning, while the second stage incorporates pairwise preference optimization to help the model learn partial orderings among candidates. At inference time, the resulting experience scores are used to promote high-quality but underexposed videos in reranking, and further guide page-level optimization through reinforcement learning. Experiments show that the proposed method achieves consistent improvements over strong baselines in offline metrics including AUC, NDCG@K, and human preference judgement. An online A/B test covering 15\% of traffic further demonstrates gains in both user experience and consumption metrics, confirming the practical value of the approach in long-tail video search scenarios.
翻译:日均为亿级搜索请求服务的快手平台,短视频搜索质量至关重要。然而,长尾查询存在严重的马太效应:稀疏的用户行为数据导致模型放大点击诱饵、浅层内容等低质量内容。大语言模型(LLMs)的最新发展提供了新范式,其固有的世界知识能够不依赖稀疏用户交互而评估内容质量。为此,我们提出基于LLM的多模态重排序框架,无需真实用户行为即可评估用户体验。该方法采用两阶段训练:第一阶段利用多模态证据构建高质量标注进行监督微调,第二阶段引入成对偏好优化帮助模型学习候选对象间的偏序关系。推理时,将获得的体验评分用于提升高质量但曝光不足视频的重排序地位,并进一步通过强化学习指导页面级优化。实验表明,该方法在AUC、NDCG@K及人工偏好评估等离线指标上较强基线取得持续提升。覆盖15%流量的在线A/B测试进一步证实该方法在用户体验与消费指标上的增益,验证了其在长尾视频搜索场景中的实践价值。