We present a novel framework for user representation in large-scale recommender systems, aiming at effectively representing diverse user taste in a generalized manner. Our approach employs a two-stage methodology combining representation learning and transfer learning. The representation learning model uses an autoencoder that compresses various user features into a representation space. In the second stage, downstream task-specific models leverage user representations via transfer learning instead of curating user features individually. We further augment this methodology on the representation's input features to increase flexibility and enable reaction to user events, including new user experiences, in Near-Real Time. Additionally, we propose a novel solution to manage deployment of this framework in production models, allowing downstream models to work independently. We validate the performance of our framework through rigorous offline and online experiments within a large-scale system, showcasing its remarkable efficacy across multiple evaluation tasks. Finally, we show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches.
翻译:我们提出了一种面向大规模推荐系统的通用用户表征框架,旨在以泛化方式有效刻画多样化的用户偏好。该框架采用表征学习与迁移学习相结合的两阶段方法:第一阶段通过自编码器将各类用户特征压缩至表征空间;第二阶段借助迁移学习使下游任务专用模型直接利用通用用户表征,无需单独设计特征工程。我们进一步改进了表征输入特征的处理方法,以增强灵活性并实现近实时响应用户事件(包括新用户体验)。此外,我们提出了一种新型生产环境部署方案,确保下游模型能独立运行。通过在大规模系统中开展严格的离线与在线实验,验证了该框架在多个评估任务中的显著效果。最后,实验表明相较于传统方案,该框架可大幅降低基础设施成本。