Existing work has revealed that large-scale offline evaluation of recommender systems for user-item interactions is prone to bias caused by the deployed system itself, as a form of closed loop feedback. Many adopt the \textit{propensity} concept to analyze or mitigate this empirical issue. In this work, we extend the analysis to session-based setup and adapted propensity calculation to the unique characteristics of session-based recommendation tasks. Our experiments incorporate neural models and KNN-based models, and cover both the music and the e-commerce domain. We study the distributions of propensity and different stratification techniques on different datasets and find that propensity-related traits are actually dataset-specific. We then leverage the effect of stratification and achieve promising results compared to the original models.
翻译:现有研究表明,在用户-物品交互的推荐系统中,大规模离线评估容易因部署系统本身产生的闭环反馈而引入偏差。许多研究采用倾向性概念来分析或缓解这一实证问题。在本工作中,我们将分析扩展到基于会话的设定,并调整了倾向性计算方法,以适应基于会话的推荐任务的独特特征。实验涵盖了神经网络模型与基于KNN的模型,并覆盖了音乐和电子商务两个领域。通过研究不同数据集上的倾向性分布及多种分层技术,我们发现倾向性相关特征实际上具有数据集特异性。随后,我们利用分层效应,在原始模型基础上取得了显著改进的结果。