Recommendation systems rely on user-provided data to learn about item quality and provide personalized recommendations. An implicit assumption when aggregating ratings into item quality is that ratings are strong indicators of item quality. In this work, we test this assumption using data collected from a music discovery application. Our study focuses on two factors that cause rating inflation: heterogeneous user rating behavior and the dynamics of personalized recommendations. We show that user rating behavior substantially varies by user, leading to item quality estimates that reflect the users who rated an item more than the item quality itself. Additionally, items that are more likely to be shown via personalized recommendations can experience a substantial increase in their exposure and potential bias toward them. To mitigate these effects, we analyze the results of a randomized controlled trial in which the rating interface was modified. The test resulted in a substantial improvement in user rating behavior and a reduction in item quality inflation. These findings highlight the importance of carefully considering the assumptions underlying recommendation systems and designing interfaces that encourage accurate rating behavior.
翻译:推荐系统依赖用户提供的数据来学习项目质量并提供个性化推荐。在将评分聚合为项目质量时,隐含的假设是评分是项目质量的强指标。本研究利用从音乐发现应用程序收集的数据验证了这一假设。我们重点关注导致评分膨胀的两个因素:用户评分行为的异质性以及个性化推荐的动态性。结果表明,用户评分行为因个体差异而显著不同,导致项目质量评估更多地反映评分用户特征而非项目本身质量。此外,通过个性化推荐更有可能展示的项目会面临曝光量大幅增加及其潜在偏差。为缓解这些影响,我们分析了修改评分界面的随机对照试验结果。该测试显著改善了用户评分行为,并减少了项目质量膨胀。这些发现凸显了审慎审视推荐系统隐含假设、设计鼓励准确评分行为的界面的重要性。