We address the problem of accurate, training-free guidance for conditional generation in trained diffusion models. Existing methods typically rely on point-estimates to approximate the posterior score, often resulting in biased approximations that fail to capture multimodality inherent to the reverse process of diffusion models. We propose a sequential Monte Carlo (SMC) framework that constructs an unbiased estimator of $p_θ(y|x_t)$ by integrating over the full denoising distribution via Monte Carlo approximation. To ensure computational tractability, we incorporate variance-reduction schemes based on Multi-Level Monte Carlo (MLMC). Our approach achieves new state-of-the-art results for training-free guidance on CIFAR-10 class-conditional generation, achieving $95.6\%$ accuracy with $3\times$ lower cost-per-success than baselines. On ImageNet, our algorithm achieves $1.5\times$ cost-per-success advantage over existing methods.
翻译:本文针对已训练扩散模型中条件生成的精确免训练引导问题展开研究。现有方法通常依赖点估计来近似后验得分,这往往导致有偏近似,无法捕捉扩散模型反向过程固有的多模态特性。我们提出一种序列蒙特卡洛框架,通过蒙特卡洛近似对完整去噪分布进行积分,构建出 $p_θ(y|x_t)$ 的无偏估计量。为确保计算可行性,我们引入了基于多层级蒙特卡洛的方差缩减方案。该方法在CIFAR-10类别条件生成任务中实现了免训练引导的最新最优结果,以低于基线3倍的单次成功成本达到 $95.6\%$ 的准确率。在ImageNet数据集上,本算法相较于现有方法实现了 $1.5$ 倍的单次成功成本优势。