Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.
翻译:扩散模型功能强大,但其训练需要大量时间和数据。我们提出补丁扩散(Patch Diffusion)——一种通用的补丁级训练框架,可显著减少训练时间成本并提高数据效率,从而有助于将扩散模型训练普及至更广泛的用户群体。我们创新的核心在于提出了一种新的补丁级条件分数函数,该函数将原始图像中的补丁位置作为附加坐标通道加入,同时训练过程中对补丁尺寸进行随机化和多样化处理,以编码多尺度的跨区域依赖关系。我们的采样方法与原始扩散模型同样简单易行。通过补丁扩散,我们可实现$\mathbf{\ge 2\times}$倍的训练加速,同时保持可比或更优的生成质量。补丁扩散还能提升在较小数据集(例如仅需从5000张图像开始训练)上训练的扩散模型性能。我们在CelebA-64$\times$64上取得了1.77的FID分数,在AFHQv2-Wild-64$\times$64上取得了1.93的FID分数,均达到了业界最佳水平。我们很快将公开代码和预训练模型。