Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Nevertheless, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a general theoretical framework to analyze stochastic linear bandits in the presence of approximate inference and conduct regret analysis on two Bayesian bandit algorithms, Linear Thompson sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that both LinTS and LinBUCB can preserve their original rates of regret upper bound but with a sacrifice of larger constant terms when applied with approximate inference. These results hold for general Bayesian inference approaches, under the assumption that the inference error measured by two different $\alpha$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB improves the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.
翻译:采用近似贝叶斯推断的贝叶斯赌博机算法已在现实应用中广泛使用。然而,其理论依据在现有文献中较少得到探讨,特别是在上下文赌博机问题中。为填补这一空白,我们提出了一个通用理论框架来分析存在近似推断情况下的随机线性赌博机问题,并对两种贝叶斯赌博机算法——线性汤普森采样(LinTS)和贝叶斯上置信界扩展算法(即线性贝叶斯上置信界,LinBUCB)进行了遗憾分析。我们证明,当采用近似推断时,LinTS和LinBUCB均能保持其原有的遗憾上界收敛速率,但需以更大的常数项为代价。这些结论适用于一般贝叶斯推断方法,其前提是由两种不同$\alpha$-散度衡量的推断误差存在上界。此外,通过引入良态分布的新定义,我们证明LinBUCB将LinTS的遗憾率从$\tilde{O}(d^{3/2}\sqrt{T})$提升至$\tilde{O}(d\sqrt{T})$,达到了极小极大最优速率。据我们所知,本研究首次为具有有界近似推断误差的随机线性赌博机场景提供了遗憾界分析。