Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward models, and it achieves the asymptotically minimax-optimal regret in the linear setting when posteriors are exact. However, its performance with \emph{approximate} posteriors -- common in large-scale or neural problems -- has not been benchmarked. We provide the first systematic study of FG-TS and its smoothed variant (SFG-TS) across eleven real-world and synthetic benchmarks. To evaluate their robustness, we compare performance across settings with exact posteriors (linear and logistic bandits) to approximate regimes produced by fast but coarse stochastic-gradient samplers. Ablations over preconditioning, bonus scale, and prior strength reveal a trade-off: larger bonuses help when posterior samples are accurate, but hurt when sampling noise dominates. FG-TS generally outperforms vanilla TS in linear and logistic bandits, but tends to be weaker in neural bandits. Nevertheless, because FG-TS and its variants are competitive and easy-to-use, we recommend them as baselines in modern contextual-bandit benchmarks. Finally, we provide source code for all our experiments in https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown.
翻译:汤普森采样(TS)被广泛用于解决上下文赌博机中的探索/利用权衡问题,然而近期理论表明,在高维问题中其探索力度不足。感觉良好的汤普森采样(FG-TS)通过添加偏向高奖励模型的乐观奖励来解决这一问题,并在后验分布精确的线性设定中实现了渐近极小极大最优遗憾。然而,其在\emph{近似}后验分布(常见于大规模或神经网络问题)下的性能尚未得到系统评估。我们首次对FG-TS及其平滑变体(SFG-TS)在十一个真实世界与合成基准测试中进行了系统研究。为评估其鲁棒性,我们比较了精确后验(线性和逻辑赌博机)与由快速但粗糙的随机梯度采样器生成的近似机制下的性能。通过对预处理、奖励尺度及先验强度的消融实验,揭示了一种权衡关系:当后验样本准确时,较大的奖励有助于提升性能;但当采样噪声占主导时,反而会损害性能。FG-TS在线性和逻辑赌博机中通常优于原始TS,但在神经赌博机中往往表现较弱。尽管如此,由于FG-TS及其变体具有竞争力且易于使用,我们建议在现代上下文赌博机基准测试中将其作为基线方法。最后,我们在https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown提供了所有实验的源代码。