We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Reparameterized Time (ART) that controls the clock speed of a reparameterized time variable, leading to a time change and uneven timesteps along the sampling trajectory while preserving the terminal time. The objective is to minimize the aggregate error arising from the discretized Euler scheme. We derive a randomized control companion, ART-RL, and formulate time change as a continuous-time reinforcement learning (RL) problem with Gaussian policies. We then prove that solving ART-RL recovers the optimal ART schedule, which in turn enables practical actor--critic updates to learn the latter in a data-driven way. Empirically, based on the official EDM pipeline, ART-RL improves Fréchet Inception Distance on CIFAR-10 over a wide range of budgets and transfers to AFHQv2, FFHQ, and ImageNet without the need of retraining.
翻译:本文研究基于分数的扩散模型在有限网格上生成样本时的时间离散化问题。在给定时间步数预算下,均匀网格和人工设计的网格可能并非最优选择。我们提出自适应重参数化时间方法,该方法通过控制重参数化时间变量的时钟速度,在保持终止时间不变的前提下实现时间变换,从而在采样轨迹上产生非均匀时间步。该方法的目标是最小化由离散化欧拉格式产生的累积误差。我们推导出其随机控制版本ART-RL,并将时间变换建模为具有高斯策略的连续时间强化学习问题。随后证明求解ART-RL即可恢复最优ART调度方案,进而可通过实际执行的行动者-评论家更新以数据驱动方式学习该调度方案。实验基于官方EDM流程表明:在CIFAR-10数据集上,ART-RL能在广泛的时间步预算范围内改进Fréchet起始距离指标,且无需重新训练即可迁移至AFHQv2、FFHQ和ImageNet数据集。