Towards End-to-End Alignment of User Satisfaction via Questionnaire in Video Recommendation

Short-video recommender systems typically optimize ranking models using dense user behavioral signals, such as clicks and watch time. However, these signals are only indirect proxies of user satisfaction and often suffer from noise and bias. Recently, explicit satisfaction feedback collected through questionnaires has emerged as a high-quality direct alignment supervision, but is extremely sparse and easily overwhelmed by abundant behavioral data, making it difficult to incorporate into online recommendation models. To address these challenges, we propose a novel framework which is towards End-to-End Alignment of user Satisfaction via Questionaire, named EASQ, to enable real-time alignment of ranking models with true user satisfaction. Specifically, we first construct an independent parameter pathway for sparse questionnaire signals by combining a multi-task architecture and a lightweight LoRA module. The multi-task design separates sparse satisfaction supervision from dense behavioral signals, preventing the former from being overwhelmed. The LoRA module pre-inject these preferences in a parameter-isolated manner, ensuring stability in the backbone while optimizing user satisfaction. Furthermore, we employ a DPO-based optimization objective tailored for online learning, which aligns the main model outputs with sparse satisfaction signals in real time. This design enables end-to-end online learning, allowing the model to continuously adapt to new questionnaire feedback while maintaining the stability and effectiveness of the backbone. Extensive offline experiments and large-scale online A/B tests demonstrate that EASQ consistently improves user satisfaction metrics across multiple scenarios. EASQ has been successfully deployed in a production short-video recommendation system, delivering significant and stable business gains.

翻译：短视频推荐系统通常利用点击量和观看时长等密集用户行为信号来优化排序模型。然而，这些信号仅是用户满意度的间接代理，且常受噪声和偏差影响。近年来，通过问卷收集的显式满意度反馈已成为一种高质量的直接对齐监督信号，但其极为稀疏且易被海量行为数据淹没，难以融入在线推荐模型。为应对这些挑战，我们提出一种新颖框架——基于问卷反馈的用户满意度端到端对齐方法（EASQ），以实现排序模型与真实用户满意度的实时对齐。具体而言，我们首先通过结合多任务架构与轻量级LoRA模块，为稀疏的问卷信号构建独立的参数通路。多任务设计将稀疏的满意度监督与密集行为信号分离，避免前者被淹没；LoRA模块以参数隔离的方式预注入这些偏好，在优化用户满意度的同时确保骨干网络的稳定性。此外，我们采用专为在线学习设计的基于DPO的优化目标，使主模型输出能够实时对齐稀疏的满意度信号。该设计实现了端到端的在线学习，使模型在保持骨干网络稳定性与效能的同时，持续适应新的问卷反馈。大量离线实验与大规模在线A/B测试表明，EASQ在多种场景下均能持续提升用户满意度指标。目前EASQ已成功部署于生产级短视频推荐系统，并带来显著且稳定的业务收益。