Controllable Multi-Objective Re-ranking with Policy Hypernetworks

Multi-stage ranking pipelines have become widely used strategies in modern recommender systems, where the final stage aims to return a ranked list of items that balances a number of requirements such as user preference, diversity, novelty etc. Linear scalarization is arguably the most widely used technique to merge multiple requirements into one optimization objective, by summing up the requirements with certain preference weights. Existing final-stage ranking methods often adopt a static model where the preference weights are determined during offline training and kept unchanged during online serving. Whenever a modification of the preference weights is needed, the model has to be re-trained, which is time and resources inefficient. Meanwhile, the most appropriate weights may vary greatly for different groups of targeting users or at different time periods (e.g., during holiday promotions). In this paper, we propose a framework called controllable multi-objective re-ranking (CMR) which incorporates a hypernetwork to generate parameters for a re-ranking model according to different preference weights. In this way, CMR is enabled to adapt the preference weights according to the environment changes in an online manner, without retraining the models. Moreover, we classify practical business-oriented tasks into four main categories and seamlessly incorporate them in a new proposed re-ranking model based on an Actor-Evaluator framework, which serves as a reliable real-world testbed for CMR. Offline experiments based on the dataset collected from Taobao App showed that CMR improved several popular re-ranking models by using them as underlying models. Online A/B tests also demonstrated the effectiveness and trustworthiness of CMR.

翻译：多阶段排序流水线已成为现代推荐系统中广泛采用的策略，其最终阶段旨在返回一个平衡用户偏好、多样性、新颖性等多重需求的排序列表。线性标量化是最常用的多需求合并技术，通过为各需求赋予特定偏好权重求和后转化为单一优化目标。现有最终阶段排序方法通常采用静态模型——偏好权重在离线训练时确定，在线服务时保持不变。每当需要调整偏好权重时，模型必须重新训练，导致时间与资源效率低下。与此同时，最合适的权重值可能因目标用户群体差异或不同时间阶段（如节日促销期）而发生显著变化。本文提出一种名为"可控多目标重排（CMR）"的框架，通过引入超网络根据不同的偏好权重生成重排模型的参数。这种设计使CMR能够在线适配环境变化下的偏好权重，无需重新训练模型。此外，我们将实际业务导向任务归纳为四类核心类别，并基于Actor-Evaluator框架无缝集成到新型重排模型中，该模型为CMR提供了可靠的现实测试环境。基于淘宝App数据集的离线实验表明，CMR将多个主流重排模型作为底层模型时显著提升了其性能。在线A/B测试也验证了CMR的有效性与可靠性。