Controllable Multi-Objective Re-ranking with Policy Hypernetworks

Multi-stage ranking pipelines have become widely used strategies in modern recommender systems, where the final stage aims to return a ranked list of items that balances a number of requirements such as user preference, diversity, novelty etc. Linear scalarization is arguably the most widely used technique to merge multiple requirements into one optimization objective, by summing up the requirements with certain preference weights. Existing final-stage ranking methods often adopt a static model where the preference weights are determined during offline training and kept unchanged during online serving. Whenever a modification of the preference weights is needed, the model has to be re-trained, which is time and resources inefficient. Meanwhile, the most appropriate weights may vary greatly for different groups of targeting users or at different time periods (e.g., during holiday promotions). In this paper, we propose a framework called controllable multi-objective re-ranking (CMR) which incorporates a hypernetwork to generate parameters for a re-ranking model according to different preference weights. In this way, CMR is enabled to adapt the preference weights according to the environment changes in an online manner, without retraining the models. Moreover, we classify practical business-oriented tasks into four main categories and seamlessly incorporate them in a new proposed re-ranking model based on an Actor-Evaluator framework, which serves as a reliable real-world testbed for CMR. Offline experiments based on the dataset collected from Taobao App showed that CMR improved several popular re-ranking models by using them as underlying models. Online A/B tests also demonstrated the effectiveness and trustworthiness of CMR.

翻译：多阶段排序流水线已成为现代推荐系统中广泛采用的策略，其最终阶段旨在返回一个兼顾用户偏好、多样性、新颖性等需求的排序列表。线性标量化方法通过将多个需求与特定偏好权重加权求和，将多目标优化问题转化为单一目标，是目前应用最广泛的技术之一。现有最终阶段排序方法通常采用静态模型，在离线训练时确定偏好权重并在在线服务期间保持不变。当需要调整偏好权重时，模型必须重新训练，造成时间和资源效率低下。同时，不同用户群体或不同时间段（如假期促销期间）的最优偏好权重可能存在显著差异。本文提出一种名为可控多目标重排序（CMR）的框架，该框架引入超网络，根据不同偏好权重为重排序模型生成参数。通过这种方式，CMR能够在线适应环境变化而无需重新训练模型。进一步地，我们将实际业务导向的任务划分为四大类别，并基于演员-评估器框架将其无缝集成到新提出的重排序模型中，该模型为CMR提供了可靠的现实测试环境。基于淘宝APP收集的数据集的离线实验表明，CMR通过将多种流行重排序模型作为基础模型提升了其性能。在线A/B测试也验证了CMR的有效性和可靠性。