Can Probabilistic Feedback Drive User Impacts in Online Platforms?

A common explanation for negative user impacts of content recommender systems is misalignment between the platform's objective and user welfare. In this work, we show that misalignment in the platform's objective is not the only potential cause of unintended impacts on users: even when the platform's objective is fully aligned with user welfare, the platform's learning algorithm can induce negative downstream impacts on users. The source of these user impacts is that different pieces of content may generate observable user reactions (feedback information) at different rates; these feedback rates may correlate with content properties, such as controversiality or demographic similarity of the creator, that affect the user experience. Since differences in feedback rates can impact how often the learning algorithm engages with different content, the learning algorithm may inadvertently promote content with certain such properties. Using the multi-armed bandit framework with probabilistic feedback, we examine the relationship between feedback rates and a learning algorithm's engagement with individual arms for different no-regret algorithms. We prove that no-regret algorithms can exhibit a wide range of dependencies: if the feedback rate of an arm increases, some no-regret algorithms engage with the arm more, some no-regret algorithms engage with the arm less, and other no-regret algorithms engage with the arm approximately the same number of times. From a platform design perspective, our results highlight the importance of looking beyond regret when measuring an algorithm's performance, and assessing the nature of a learning algorithm's engagement with different types of content as well as their resulting downstream impacts.

翻译：内容推荐系统对用户产生负面影响的常见解释是平台目标与用户福祉之间的不一致。在这项工作中，我们表明平台目标的不一致并非对用户造成意外影响的唯一潜在原因：即使平台目标与用户福祉完全一致，平台的学习算法仍可能对用户产生负面后续影响。这些用户影响的根源在于，不同内容片段可能以不同速率产生可观察的用户反应（反馈信息）；这些反馈速率可能与影响用户体验的内容属性相关，例如争议性或其创作者的群体相似性。由于反馈速率的差异会影响学习算法与不同内容交互的频率，学习算法可能无意中推广具有某些此类属性的内容。利用具有概率反馈的多臂老虎机框架，我们研究了不同无遗憾算法的反馈速率与学习算法对各臂交互程度之间的关系。我们证明无遗憾算法可能表现出广泛的依赖性：当某个臂的反馈速率增加时，一些无遗憾算法会与其更多交互，一些无遗憾算法会与其更少交互，而其他无遗憾算法与其交互次数大致相同。从平台设计角度来看，我们的结果强调了在衡量算法性能时，超越遗憾指标的重要性，以及评估学习算法与不同类型内容的交互本质及其产生的后续影响。