Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem. This represents the first rigorous regret analysis of ensemble sampling and is made possible by leveraging information-theoretic concepts and novel analytic techniques that may prove useful beyond the scope of this paper.
翻译:集成采样是汤普森采样的一种实用近似方法,适用于在维持模型参数精确后验分布计算不可行时的场景。本文针对线性赌博机问题,建立了能确保集成采样算法具有理想性能的遗憾界。这是首次对集成采样进行严格遗憾分析,其实现得益于信息论概念与新颖分析技术的运用,这些方法可能在本论文范围之外也具有重要价值。