Current multi-armed bandit approaches in recommender systems (RS) have focused more on devising effective exploration techniques, while not adequately addressing common exploitation challenges related to distributional changes and item cannibalization. Little work exists to guide the design of robust bandit frameworks that can address these frequent challenges in RS. In this paper, we propose a new design principles to (i) make bandit models robust to time-variant metadata signals, (ii) less prone to item cannibalization, and (iii) prevent their weights fluctuating due to data sparsity. Through a series of experiments, we systematically examine the influence of several important bandit design choices. We demonstrate the advantage of our proposed design principles at making bandit models robust to dynamic behavioral changes through in-depth analyses. Noticeably, we show improved relative gain compared to a baseline bandit model not incorporating our design choices of up to $11.88\%$ and $44.85\%$, respectively in ROC-AUC and PR-AUC. Case studies about fairness in recommending specific popular and unpopular titles are presented, to demonstrate the robustness of our proposed design at addressing popularity biases.
翻译:当前推荐系统中的多臂赌博机方法更侧重于设计有效的探索技术,而未能充分应对与分布变化和项目同质化相关的常见利用挑战。目前鲜有研究指导设计能够应对推荐系统中这些常见挑战的鲁棒赌博机框架。本文提出了一套新的设计原则,旨在:(i) 使赌博机模型对时变元数据信号具有鲁棒性,(ii) 降低其受项目同质化影响的倾向性,(iii) 防止因数据稀疏性导致的权重波动。通过一系列实验,我们系统性地考察了多种重要赌博机设计选择的影响。通过深入分析,我们证明了所提出的设计原则在使赌博机模型对动态行为变化具有鲁棒性方面的优势。值得注意的是,与未采用我们设计选择的基线赌博机模型相比,我们在ROC-AUC和PR-AUC上分别获得了最高11.88%和44.85%的相对增益提升。此外,我们针对推荐特定热门与非热门标题的公平性案例研究,展示了所提出设计在应对流行度偏差时的鲁棒性。