The safe linear bandit problem is a version of the classic linear bandit problem where the learner's actions must satisfy an uncertain linear constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. We find that by exploiting the geometry of the specific problem setting, we can achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Additionally, we propose a novel algorithm for this setting that chooses problem parameters adaptively and enjoys at least as good regret guarantees as existing algorithms. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach. Simulation results show improved performance over existing algorithms for a variety of randomly sampled settings.
翻译:安全线性赌博机问题是经典线性赌博机问题的一个变体,其中学习者的动作在所有轮次中必须满足一个不确定的线性约束。由于其在许多实际场景中的适用性,该问题近年来受到了广泛关注。我们发现,通过利用特定问题设置的几何结构,我们能够在良好分离的问题实例和有限星凸集动作集上取得改进的遗憾保证。此外,我们提出了一种针对该设置的新算法,该算法自适应地选择问题参数,并享有至少与现有算法相同的遗憾保证。最后,我们引入了一种安全线性赌博机设置的泛化形式,其中约束为凸约束,并通过采用一种基于凸分析的新方法,将我们的算法和分析适应于该设置。仿真结果表明,在各种随机采样设置中,我们的算法相比现有算法表现出更优的性能。