Can Probabilistic Feedback Drive User Impacts in Online Platforms?

A common explanation for negative user impacts of content recommender systems is misalignment between the platform's objective and user welfare. In this work, we show that misalignment in the platform's objective is not the only potential cause of unintended impacts on users: even when the platform's objective is fully aligned with user welfare, the platform's learning algorithm can induce negative downstream impacts on users. The source of these user impacts is that different pieces of content may generate observable user reactions (feedback information) at different rates; these feedback rates may correlate with content properties, such as controversiality or demographic similarity of the creator, that affect the user experience. Since differences in feedback rates can impact how often the learning algorithm engages with different content, the learning algorithm may inadvertently promote content with certain such properties. Using the multi-armed bandit framework with probabilistic feedback, we examine the relationship between feedback rates and a learning algorithm's engagement with individual arms for different no-regret algorithms. We prove that no-regret algorithms can exhibit a wide range of dependencies: if the feedback rate of an arm increases, some no-regret algorithms engage with the arm more, some no-regret algorithms engage with the arm less, and other no-regret algorithms engage with the arm approximately the same number of times. From a platform design perspective, our results highlight the importance of looking beyond regret when measuring an algorithm's performance, and assessing the nature of a learning algorithm's engagement with different types of content as well as their resulting downstream impacts.

翻译：内容推荐系统对用户产生负面影响的常见解释是平台目标与用户福利之间的错位。在本文中，我们表明平台目标的错位并非导致用户意外影响的唯一潜在原因：即使平台目标与用户福利完全对齐，平台的学习算法仍可能对用户产生负面下游影响。这些用户影响的根源在于，不同内容可能以不同的速率生成可观察的用户反应（反馈信息）；这些反馈率可能与内容属性（如争议性或创作者的人口统计相似性）相关，进而影响用户体验。由于反馈率差异可能影响学习算法对不同内容的参与频率，学习算法可能无意中推广具有某些此类属性的内容。我们利用概率反馈的多臂老虎机框架，研究了不同无遗憾算法中反馈率与学习算法对单个臂的参与程度之间的关系。我们证明，无遗憾算法可能表现出广泛的相关性：当某个臂的反馈率增加时，某些无遗憾算法会更多地参与该臂，某些无遗憾算法会减少参与该臂，而其他无遗憾算法则保持大致相同的参与次数。从平台设计角度而言，我们的结果凸显了在衡量算法性能时超越遗憾值的重要性，并需评估学习算法对不同类型内容的参与性质及其产生的下游影响。