Multi-armed Bandits (MABs) are increasingly employed in online platforms and e-commerce to optimize decision making for personalized user experiences. In this work, we focus on the Contextual Bandit problem with linear rewards, under conditions of sparsity and batched data. We address the challenge of fairness by excluding irrelevant features from decision-making processes using a novel algorithm, Online Batched Sequential Inclusion (OBSI), which sequentially includes features as confidence in their impact on the reward increases. Our experiments on synthetic data show the superior performance of OBSI compared to other algorithms in terms of regret, relevance of features used, and compute.
翻译:多臂老虎机(MABs)在在线平台和电子商务中日益被用于优化决策,以实现个性化用户体验。在本工作中,我们关注具有线性奖励的上下文老虎机问题,并考虑稀疏性和批处理数据条件。我们通过一种新颖的算法——在线批处理顺序包含(OBSI)来解决公平性挑战,该算法将无关特征排除在决策过程之外,并随着对其奖励影响置信度的增加而顺序纳入特征。我们在合成数据上的实验表明,OBSI在遗憾度、所用特征的相关性以及计算效率方面均优于其他算法。