Diffusion generative models have achieved remarkable success in generating images with a fixed resolution. However, existing models have limited ability to generalize to different resolutions when training data at those resolutions are not available. Leveraging techniques from operator learning, we present a novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the score operator by combining both spatial and spectral information at multiple resolutions. Comparisons of DFU to baselines demonstrate its scalability: 1) simultaneously training on multiple resolutions improves FID over training at any single fixed resolution; 2) DFU generalizes beyond its training resolutions, allowing for coherent, high-fidelity generation at higher-resolutions with the same model, i.e. zero-shot super-resolution image-generation; 3) we propose a fine-tuning strategy to further enhance the zero-shot super-resolution image-generation capability of our model, leading to a FID of 11.3 at 1.66 times the maximum training resolution on FFHQ, which no other method can come close to achieving.
翻译:扩散生成模型在固定分辨率图像生成方面取得了显著成功。然而,当训练数据缺乏不同分辨率时,现有模型对多分辨率的泛化能力有限。通过借鉴算子学习技术,我们提出了一种新颖的深度学习架构——双傅里叶神经算子UNet(DFU),该模型通过融合多分辨率下的空间与频谱信息来近似分数算子。与基准方法的比较证明了其可扩展性:1)在多个分辨率上的联合训练相比在任一固定分辨率上训练能够改善FID指标;2)DFU可泛化至训练分辨率之外,实现相同模型在更高分辨率下的连贯高保真生成,即零样本超分辨率图像生成;3)我们提出一种微调策略以进一步增强模型的零样本超分辨率图像生成能力,在FFHQ数据集上以1.66倍最大训练分辨率实现了11.3的FID值,该性能是其他方法无法企及的。