With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.
翻译:随着在线信息的日益丰富,推荐系统已成为众多基于网络系统的重要工具。由于推荐环境的持续性特点,这些系统越来越依赖上下文多臂赌博机(CMAB)来提供个性化和实时的建议。在这些系统中,一个关键但尚未充分探究的组成部分是用户状态的表示,它通常封装了用户的交互历史,并与模型的决策和学习深度相关。在本文中,我们研究了基于矩阵分解模型的不同嵌入状态表示对传统CMAB算法性能的影响。我们的大规模实验表明,状态表示的变化可以带来比改变赌博机算法本身更大的提升。此外,没有单一的嵌入或聚合策略能在所有数据集中一致占优,这凸显了领域特定评估的必要性。这些结果揭示了文献中的显著空白,并强调推进基于赌博机的推荐系统需要一种整体方法,在算法创新的同时优先考虑嵌入质量和状态构建。我们实验的源代码在https://github.com/UFSCar-LaSID/bandits_blind_spot上公开提供。