Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse diffusion process, which are obtained by the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.
翻译:在具有高斯先验的情境赌博机中,后验采样可通过拉普拉斯近似实现精确或近似计算。高斯先验具有计算高效性,但无法描述复杂分布。本研究提出基于扩散模型先验的情境赌博机近似后验采样算法。核心思想是从一系列近似条件后验分布中采样,这些分布对应逆向扩散过程的每个阶段,并通过拉普拉斯近似获得。我们的近似方法受高斯先验后验采样的启发,继承了其简洁性与高效性。该算法具有渐近一致性,并在多种情境赌博机问题上展现出优异的实证性能。