Efficient decision-making in contextual bandits with large action spaces is challenging, as methods lacking additional prior information may suffer from computational and statistical inefficiencies. In this work, we leverage pre-trained diffusion models as priors to capture complex action distributions and introduce a diffusion-based decision framework for contextual bandits. We develop practical algorithms to efficiently approximate posteriors under diffusion priors, enabling flexible decision-making strategies. Empirical evaluations demonstrate the effectiveness and versatility of our approach across diverse contextual bandit settings.
翻译:在大规模动作空间的上下文赌博机中进行高效决策具有挑战性,因为缺乏额外先验信息的方法可能面临计算和统计效率低下的问题。在本工作中,我们利用预训练的扩散模型作为先验来捕捉复杂的动作分布,并引入一个基于扩散的决策框架用于上下文赌博机。我们开发了实用算法,以在扩散先验下高效近似后验分布,从而实现灵活的决策策略。实证评估表明,我们的方法在多种上下文赌博机设置中均展现出有效性和通用性。