For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content. Many platforms employ online learning, which creates intertemporal incentives, since content produced today affects recommendations of future content. In this paper, we study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium. We show that classical online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content. In particular, the quality of content is upper bounded in terms of the learning rate and approaches zero for typical learning rate schedules. Motivated by this negative result, we design a different learning algorithm -- based on punishing producers who create low-quality content -- that correctly incentivizes producers to create high-quality content. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and opens the door towards designing platform learning algorithms that incentivize the creation of high-quality content.
翻译:对于诸如TikTok和YouTube等内容推荐系统,平台决策算法塑造了内容生产者的激励,包括内容生产者在内容质量上的投入程度。许多平台采用在线学习,这会产生跨期激励,因为今天生产的内容会影响未来内容的推荐。本文研究在线学习所产生的激励,分析纳什均衡下所生产内容的质量。我们证明,传统的在线学习算法(如Hedge和EXP3)不幸地激励了生产者创建低质量内容。具体而言,内容质量在学习率方面存在上界,并且在典型的学习率调度下趋近于零。受此负面结果的启发,我们设计了一种不同的学习算法——基于对创建低质量内容的生产者进行惩罚——该算法正确激励了生产者创建高质量内容。在概念层面上,我们的工作揭示了平台学习算法可能对内容质量产生的意外影响,并为设计激励高质量内容创作的平台学习算法打开了大门。