Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge provides a powerful mechanism to assess content quality, agnostic to sparse user interactions. To this end, we propose a LLM-driven multimodal reranking framework, which estimates user experience without real user behavior. The approach involves a two-stage training process: the first stage uses multimodal evidence to construct high-quality annotations for supervised fine-tuning, while the second stage incorporates pairwise preference optimization to help the model learn partial orderings among candidates. At inference time, the resulting experience scores are used to promote high-quality but underexposed videos in reranking, and further guide page-level optimization through reinforcement learning. Experiments show that the proposed method achieves consistent improvements over strong baselines in offline metrics including AUC, NDCG@K, and human preference judgement. An online A/B test covering 15\% of traffic further demonstrates gains in both user experience and consumption metrics, confirming the practical value of the approach in long-tail video search scenarios.
翻译:快手每天服务数亿次搜索,短视频搜索质量至关重要。然而,长尾查询面临严重的马太效应:稀疏的用户行为数据导致模型放大点击诱饵和浅层内容等低质量内容。大语言模型的最新进展提供了一种新范式,其固有的世界知识提供了一种评估内容质量的强大机制,且不受稀疏用户交互的影响。为此,我们提出了一种基于大语言模型驱动的多模态重排序框架,无需真实用户行为即可估计用户体验。该方法采用两阶段训练流程:第一阶段利用多模态证据构建高质量标注进行监督微调,第二阶段通过成对偏好优化帮助模型学习候选内容间的偏序关系。在推理阶段,所得体验评分用于在重排序中提升高质量但曝光不足的视频,并进一步通过强化学习引导页面级优化。实验表明,该方法在AUC、NDCG@K及人工偏好判断等离线指标上均持续优于强基线。覆盖15%流量的在线A/B测试进一步验证了其在用户体验和消费指标上的提升,证实了该方法在长尾视频搜索场景中的实用价值。