This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and scalability. However, Transformer architectures, which tokenize input data (via "patchification"), face a trade-off between visual fidelity and computational complexity due to the quadratic nature of self-attention operations concerning token length. While larger patch sizes enable attention computation efficiency, they struggle to capture fine-grained visual details, leading to image distortions. To address this challenge, we propose augmenting the Diffusion model with the Multi-Resolution network (DiMR), a framework that refines features across multiple resolutions, progressively enhancing detail from low to high resolution. Additionally, we introduce Time-Dependent Layer Normalization (TD-LN), a parameter-efficient approach that incorporates time-dependent parameters into layer normalization to inject time information and achieve superior performance. Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, where DiMR-XL variants outperform prior diffusion models, setting new state-of-the-art FID scores of 1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512. Project page: https://qihao067.github.io/projects/DiMR
翻译:本文通过集成一种新颖的多分辨率网络和时变层归一化,提出了对扩散模型的创新性改进。扩散模型因其在高保真图像生成方面的有效性而备受关注。传统方法依赖于卷积U-Net架构,而近期基于Transformer的设计已展现出更优越的性能和可扩展性。然而,由于自注意力操作相对于标记长度的二次复杂度,对输入数据进行标记化(通过"分块化")的Transformer架构面临着视觉保真度与计算复杂度之间的权衡。虽然较大的分块尺寸能提高注意力计算效率,但难以捕捉细粒度的视觉细节,从而导致图像失真。为应对这一挑战,我们提出用多分辨率网络(DiMR)增强扩散模型,该框架可在多个分辨率下细化特征,从低分辨率到高分辨率逐步提升细节。此外,我们引入了时变层归一化(TD-LN),这是一种参数高效的方法,将时变参数纳入层归一化以注入时间信息,从而实现更优性能。我们在类别条件ImageNet生成基准测试中验证了本方法的有效性:DiMR-XL变体在ImageNet 256×256上取得了1.70的FID分数,在ImageNet 512×512上取得了2.89的FID分数,超越了现有扩散模型并创造了新的最优性能记录。项目页面:https://qihao067.github.io/projects/DiMR