Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the go-to workhorse for simulations of high-dimensional posteriors. Based on the standard smoothness and log-concavity conditions, we study the accelerated posterior concentration and sampling using a specific potential function. This design improves the sample complexity for realizing logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde O}(\sqrt{d})$. The scalability and robustness of our algorithm are also empirically validated through synthetic experiments in high-dimensional bandit problems.
翻译:采用朗之万蒙特卡洛方法的近似汤普森采样,其应用范围已从高斯后验采样扩展至更一般的平滑后验分布。然而,在高维问题中追求高精度时,该方法仍然存在可扩展性挑战。针对这一问题,我们提出了一种基于欠阻尼朗之万蒙特卡洛的近似汤普森采样策略——后者正是高维后验模拟的首选工作引擎。在标准光滑性与对数凹性条件下,我们通过特定的势函数研究了后验浓度与采样的加速机制。该设计将对数遗憾的样本复杂度从$\mathcal{\tilde O}(d)$降低至$\mathcal{\tilde O}(\sqrt{d})$。通过高维赌臂问题的合成实验,我们实证验证了该算法的可扩展性与鲁棒性。