Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.
翻译:扩散模型通过噪声对图像进行退化处理,而逆转此过程可揭示跨时间步的信息层级结构。尺度空间理论通过低通滤波展现出类似的层级结构。我们形式化了这一关联,并证明高度噪声化的扩散状态所包含的信息量不超过经下采样的小尺寸图像——这引出一个问题:为何必须在全分辨率下处理这些状态?为解决此问题,我们将尺度空间融合至扩散过程中,通过构建具有广义线性退化形式及实际实现方案的扩散模型族来实现这一目标。采用下采样作为退化方式即得到我们提出的尺度空间扩散模型。为支持尺度空间扩散,我们引入了Flexi-UNet——一种仅使用网络必要部分即可执行分辨率保持与分辨率提升去噪操作的UNet变体。我们在CelebA和ImageNet数据集上评估了该框架,并分析了其在不同分辨率与网络深度下的扩展特性。项目网站(https://prateksha.github.io/projects/scale-space-diffusion/)已公开提供。