Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct regret analysis on two Bayesian bandit algorithms, Linear Thompson sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in the presence of approximate inference, LinTS and LinBUCB can preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $\alpha$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.
翻译:采用近似贝叶斯推断的贝叶斯赌博算法已在现实应用中广泛使用。尽管这些算法在实际应用中表现出优越性能,但其理论依据在现有文献中较少得到深入研究,特别是在上下文赌博问题中。为填补这一空白,我们提出了一个理论框架来分析近似推断在随机线性赌博问题中的影响,并对两种贝叶斯赌博算法——线性汤普森采样(LinTS)以及贝叶斯上置信界算法的扩展(即线性贝叶斯上置信界算法,LinBUCB)进行了遗憾分析。我们证明,当在近似推断条件下应用时,LinTS和LinBUCB能够保持其原有的遗憾上界收敛速率,但需要以更大的常数项为代价。这些结果适用于一般的贝叶斯推断方法,前提是通过两种不同的$\alpha$-散度衡量的推断误差是有界的。此外,通过引入良态分布的新定义,我们证明了LinBUCB将LinTS的遗憾速率从$\tilde{O}(d^{3/2}\sqrt{T})$提升至$\tilde{O}(d\sqrt{T})$,达到了极小极大最优速率。据我们所知,这项研究首次为具有有界近似推断误差的随机线性赌博问题提供了遗憾界分析。