In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To circumvent this issue, approximate inference techniques can be used and provide samples with distribution close to the posteriors. However, current approximate techniques yield to either poor estimation (Laplace approximation) or can be computationally expensive (MCMC methods, Ensemble sampling...). In this paper, we propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. This scheme provides powerful posterior approximations which are easy to sample from, and is computationally efficient, making it an ideal choice for TS. In addition, we show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit. Finally, we demonstrate experimentally the effectiveness of VITS on both synthetic and real world datasets.
翻译:本文介绍并分析了一种用于上下文赌博机的汤普森采样(TS)算法变体。传统TS在每轮中需要从当前后验分布中采样,而该分布通常难以处理。为规避此问题,可采用近似推断技术生成分布接近后验的样本。然而,现有近似技术要么导致估计精度不足(拉普拉斯近似),要么计算成本高昂(MCMC方法、集成采样等)。本文提出一种基于高斯变分推断的新算法——变分推断汤普森采样(VITS)。该方案能提供易于采样且计算高效的强效后验近似,成为TS方法的理想选择。此外,我们证明VITS在线性上下文赌博机中能达到与传统TS同数量级的次线性遗憾界(关于维度和轮数)。最后,通过在合成数据集与真实数据集上的实验验证了VITS的有效性。