Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. We establish a theoretical basis that supports the approach and present computational results that offer further insight.
翻译:汤普森采样已发展成为一种用于广泛在线决策问题的有效启发式方法。在其基本形式中,该算法需要计算并从未知模型的后验分布中采样,而这仅在简单的特殊情况下才易于处理。本文开发了集成采样,旨在近似汤普森采样,同时在面对神经网络等复杂模型时仍保持可处理性。集成采样极大地扩展了汤普森采样适用的应用范围。我们建立了支持该方法的理论基础,并展示了进一步揭示其特性的计算结果。