In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To circumvent this issue, approximate inference techniques can be used and provide samples with distribution close to the posteriors. However, current approximate techniques yield to either poor estimation (Laplace approximation) or can be computationally expensive (MCMC methods, Ensemble sampling...). In this paper, we propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. This scheme provides powerful posterior approximations which are easy to sample from, and is computationally efficient, making it an ideal choice for TS. In addition, we show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit. Finally, we demonstrate experimentally the effectiveness of VITS on both synthetic and real world datasets.
翻译:本文提出并分析了一种用于上下文赌博机的汤普森采样(TS)算法变体。在每一轮中,传统TS需要从当前后验分布中采样,而这通常是难以处理的。为规避此问题,可采用近似推断技术来提供分布接近后验的样本。然而,现有近似技术要么导致估计效果不佳(拉普拉斯近似),要么计算成本高昂(MCMC方法、集成采样等)。本文提出一种基于高斯变分推断的新算法——变分推断汤普森采样(VITS)。该方案提供了易于采样且计算高效的后验近似方法,使其成为TS的理想选择。此外,我们证明对于线性上下文赌博机问题,VITS在维度与轮次数上实现了与传统TS同阶的次线性遗憾界。最后,我们通过合成数据集和真实世界数据集的实验验证了VITS的有效性。