Denoising Diffusion Models (DDMs) have become the leading generative technique for synthesizing high-quality images but are often constrained by their UNet-based architectures that impose certain limitations. In particular, the considerable size of often hundreds of millions of parameters makes them impractical when hardware resources are limited. However, even with powerful hardware, processing images in the gigapixel range is difficult. This is especially true in fields such as microscopy or satellite imaging, where such challenges arise from the limitation to a predefined generative size and the inefficient scaling to larger images. We present two variations of Neural Cellular Automata (NCA)-based DDM methods to address these challenges and jumpstart NCA-based DDMs: Diff-NCA and FourierDiff-NCA. Diff-NCA performs diffusion by using only local features of the underlying distribution, making it suitable for applications where local features are critical. To communicate global knowledge in image space, naive NCA setups require timesteps that increase with the image scale. We solve this bottleneck of current NCA architectures by introducing FourierDiff-NCA, which advances Diff-NCA by adding a Fourier-based diffusion process and combines the frequency-organized Fourier space with the image space. By initiating diffusion in the Fourier domain and finalizing it in the image space, FourierDiff-NCA accelerates global communication. We validate our techniques by using Diff-NCA (208k parameters) to generate high-resolution digital pathology scans at 576x576 resolution and FourierDiff-NCA (887k parameters) to synthesize CelebA images at 64x64, outperforming VNCA and five times bigger UNet-based DDMs. In addition, we demonstrate FourierDiff-NCA's capabilities in super-resolution, OOD image synthesis, and inpainting without additional training.
翻译:去噪扩散模型已成为合成高质量图像的主流生成技术,但其基于UNet的架构存在若干固有限制。特别是其动辄数亿的参数规模,使得硬件资源受限时难以实际应用。即便使用高性能硬件,处理十亿像素级图像仍面临挑战——这在显微成像或卫星遥感等领域尤为突出,具体表现为预定义生成尺寸的限制以及向更大图像扩展时存在的低效问题。我们提出两种基于神经细胞自动机的扩散模型变体,以应对这些挑战并推动NCA类扩散模型的发展:Diff-NCA与FourierDiff-NCA。Diff-NCA仅利用底层分布的局部特征执行扩散,适用于局部特征至关重要的应用场景。在图像空间传递全局知识时,常规NCA架构所需的迭代步长会随图像尺度增大而增加。我们通过引入FourierDiff-NCA突破当前NCA架构这一瓶颈:该模型通过添加基于傅里叶的扩散过程改进Diff-NCA,将频域组织的傅里叶空间与图像空间有机结合。通过在傅里叶域启动扩散并在图像空间完成扩散,FourierDiff-NCA加速了全局信息传递。我们通过实验验证了所提方法:使用Diff-NCA(20.8万参数)生成576×576分辨率的高分辨率数字病理扫描图像,以及FourierDiff-NCA(88.7万参数)合成64×64分辨率的CelebA图像,其性能超越VNCA和参数规模为其五倍的UNet扩散模型。此外,我们还证明了FourierDiff-NCA在无需额外训练的情况下,具备超分辨率、分布外图像合成及图像修复等能力。