Using multiple user representations (MUR) to model user behavior instead of a single user representation (SUR) has been shown to improve personalization in recommendation systems. However, the performance gains observed with MUR can be sensitive to the skewness in the item and/or user interest distribution. When the data distribution is highly skewed, the gains observed by learning multiple representations diminish since the model dominates on head items/interests, leading to poor performance on tail items. Robustness to data sparsity is therefore essential for MUR-based approaches to achieve good performance for recommendations. Yet, research in MUR and data imbalance have largely been done independently. In this paper, we delve deeper into the shortcomings of MUR inferred from imbalanced data distributions. We make several contributions: (1) Using synthetic datasets, we demonstrate the sensitivity of MUR with respect to data imbalance, (2) To improve MUR for tail items, we propose an iterative density weighting scheme (IDW) with user tower calibration to mitigate the effect of training over long-tail distribution on personalization, and (3) Through extensive experiments on three real-world benchmarks, we demonstrate IDW outperforms other alternatives that address data imbalance.
翻译:使用多重用户表征(MUR)而非单一用户表征(SUR)建模用户行为已被证明能提升推荐系统的个性化效果。然而,MUR带来的性能增益可能对物品和/或用户兴趣分布的偏态性敏感。当数据分布高度偏斜时,由于模型主导头部物品/兴趣的学习,导致尾部物品性能欠佳,多重表征学习的增益将减弱。因此,对数据稀疏性的鲁棒性是基于MUR方法实现推荐性能的关键。然而,现有关于MUR与数据不平衡的研究多相互独立。本文深入剖析了不平衡数据分布下MUR的缺陷,并作出以下贡献:(1) 利用合成数据集,揭示了MUR对数据不平衡的敏感性;(2) 为改善尾部物品的MUR表现,提出迭代密度加权方案(IDW),通过用户塔校准机制缓解长尾分布训练对个性化效果的影响;(3) 在三个真实世界基准上的大量实验表明,IDW在应对数据不平衡问题上优于其他替代方法。