Recommender systems have been shown to exhibit popularity bias by over-recommending popular items and under-recommending relevant niche items. We seek to understand niche users in benchmark recommendation datasets as a step toward mitigating popularity bias. We find that, compared to mainstream users, niche-preferring users exhibit a longer-tailed activity-level distribution, indicating the existence of users who both prefer niche items and exhibit high activity levels on platforms. We partition users along two axes: (1) activity level ("power" vs. "light") and (2) item-popularity preference ("mainstream" vs. "niche"), and show that in three benchmark datasets, the number of power-niche users (high activity and niche preference) is statistically significantly larger than expected. We also find that interaction data from power-niche users is especially valuable for improving recommendations for not only niche but also mainstream users. In contrast, many existing popularity bias mitigation methods have focused on upweighting niche users regardless of activity level. Motivated by the value of power-niche user data, we propose PAIR (Popularity-and-Activity-Informed Reweighting), a framework for reweighting the Bayesian Personalized Ranking (BPR) loss that simultaneously reweights based on user activity level and item popularity, upweighting power-niche users the most. We instantiate the framework on both deep and shallow collaborative filtering models, and experiments on benchmark datasets show that PAIR reduces popularity bias and can increase overall performance. Although existing popularity-bias mitigation methods yield a trade-off between performance and bias, our results suggest that considering both user activity level and popularity preference leads to Pareto-dominant performance.
翻译:推荐系统已被证明存在流行度偏差,即过度推荐热门项目而较少推荐相关的利基项目。为缓解流行度偏差,本研究尝试理解基准推荐数据集中的利基用户。我们发现,与主流用户相比,偏好利基项目的用户呈现出更长尾的活动水平分布,这表明存在既偏好利基项目又在平台上表现出高活跃度的用户群体。我们沿两个维度对用户进行划分:(1)活动水平(“高活跃度”与“低活跃度”);(2)项目流行度偏好(“主流”与“利基”),并在三个基准数据集中证明,高活跃度利基用户(高活跃度且偏好利基项目)的数量在统计上显著高于预期。我们还发现,来自高活跃度利基用户的交互数据对于改善利基用户乃至主流用户的推荐效果尤其有价值。相比之下,许多现有的流行度偏差缓解方法侧重于加权所有利基用户,而未考虑其活动水平。基于高活跃度利基用户数据的价值,我们提出了PAIR(Popularity-and-Activity-Informed Reweighting)框架,该框架通过对贝叶斯个性化排序(BPR)损失进行重新加权,同时依据用户活动水平和项目流行度进行调整,并对高活跃度利基用户赋予最高权重。我们在深度和浅层协同过滤模型上实例化了该框架,基准数据集上的实验表明,PAIR能够有效降低流行度偏差,并可能提升整体性能。尽管现有的流行度偏差缓解方法通常在性能与偏差之间进行权衡,但我们的结果表明,同时考虑用户活动水平和流行度偏好能够带来帕累托占优的性能表现。