Recommendation systems rely on user-provided data to learn about item quality and provide personalized recommendations. An implicit assumption when aggregating ratings into item quality is that ratings are strong indicators of item quality. In this work, we test this assumption using data collected from a music discovery application. Our study focuses on two factors that cause rating inflation: heterogeneous user rating behavior and the dynamics of personalized recommendations. We show that user rating behavior substantially varies by user, leading to item quality estimates that reflect the users who rated an item more than the item quality itself. Additionally, items that are more likely to be shown via personalized recommendations can experience a substantial increase in their exposure and potential bias toward them. To mitigate these effects, we analyze the results of a randomized controlled trial in which the rating interface was modified. The test resulted in a substantial improvement in user rating behavior and a reduction in item quality inflation. These findings highlight the importance of carefully considering the assumptions underlying recommendation systems and designing interfaces that encourage accurate rating behavior.
翻译:推荐系统依赖用户提供的数据来学习物品质量并提供个性化推荐。将评分聚合为物品质量的一个隐含假设是:评分能强有力地指示物品质量。在本研究中,我们利用从一款音乐发现应用收集的数据验证了这一假设。我们重点关注导致评分通货膨胀的两个因素:异质性用户评分行为与个性化推荐的动态性。研究表明,用户评分行为在不同个体间存在显著差异,导致物品质量评估结果更多反映评分用户的特征而非物品本身质量。此外,通过个性化推荐更易展示的物品会获得显著增加的曝光量,并可能产生偏向性。为缓解这些影响,我们分析了一项随机对照试验的结果——该试验修改了评分界面。测试结果显示用户评分行为显著改善,物品质量通货膨胀程度降低。这些发现凸显了审慎审视推荐系统隐含假设、设计能促进准确评分行为的界面的重要性。