This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.
翻译:本文提出再生粒子汤普森采样(Regenerative Particle Thompson Sampling, RPTS),这是一种汤普森采样的灵活变体。汤普森采样本身是用于解决随机老虎机问题的贝叶斯启发式方法,但由于维护连续后验分布的难解性,实践中难以实现。粒子汤普森采样(PTS)通过将连续分布替换为由一组带权静态粒子支撑的离散分布,实现了对汤普森采样的近似。我们观察到,在PTS中,除少数拟合粒子外,其余粒子的权重均收敛至零。RPTS基于如下启发式策略:删除权重衰减的非拟合粒子,并在存活的拟合粒子附近再生新粒子。实验证据表明,在一组代表性老虎机问题(包括5G网络切片应用)中,从PTS到RPTS均呈现一致的性能提升,且RPTS展现出灵活性与有效性。