We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
翻译:本研究通过序贯贝叶斯推理的视角探讨非平稳线性上下文赌博机问题。现有算法通常依赖于加权正则化最小二乘目标函数,而本研究提出加权序贯贝叶斯方法,该方法通过后验分布来跟踪时变奖励参数。我们的主要贡献是建立了WSB后验分布的新型集中不等式,该不等式引入了一个先验依赖项以量化初始信念的影响。我们证明这种影响会随时间衰减,并推导出易于处理的上界,使得该结果既能用于理论分析又能指导算法设计。基于WSB框架,我们提出了三种算法:WSB-LinUCB、WSB-RandLinUCB和WSB-LinTS。我们建立了频率学派的遗憾保证:WSB-LinUCB达到了基于WRLS的最佳已知性能保证,而WSB-RandLinUCB和WSB-LinTS则实现了更优的性能,同时保持了与基于WRLS算法相当的计算效率。