The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead.
翻译:用户通过其选择行为(如点击、购买)提供的反馈,是训练搜索与推荐算法最常见的数据类型之一。然而,短视地基于选择数据训练系统可能仅能提升短期参与度,却无法保障平台的长期可持续性及其用户、内容提供者和其他利益相关者的长期收益。为此,本文提出一个新框架,允许决策者(如平台运营方、监管机构、用户)针对平台行为表达长期目标(例如公平性、收益分配、法律要求)。这些目标以远超单次会话的曝光或影响指标形式呈现,我们提供了新的基于控制理论的算法来实现这些目标。具体而言,控制器设计旨在以对短期参与度影响最小的方式达成既定长期目标。除了从理论层面推导控制器的基本原理外,我们还在合成数据与真实数据上评估了算法性能。尽管所有控制器均表现良好,但研究发现它们在效率、鲁棒性和前瞻规划能力方面呈现出有趣的权衡。