We introduce the Pyramid Diffusion Model (PDM), a novel architecture designed for ultra-high-resolution image synthesis. PDM utilizes a pyramid latent representation, providing a broader design space that enables more flexible, structured, and efficient perceptual compression which enable AutoEncoder and Network of Diffusion to equip branches and deeper layers. To enhance PDM's capabilities for generative tasks, we propose the integration of Spatial-Channel Attention and Res-Skip Connection, along with the utilization of Spectral Norm and Decreasing Dropout Strategy for the Diffusion Network and AutoEncoder. In summary, PDM achieves the synthesis of images with a 2K resolution for the first time, demonstrated on two new datasets comprising images of sizes 2048x2048 pixels and 2048x1024 pixels respectively. We believe that this work offers an alternative approach to designing scalable image generative models, while also providing incremental reinforcement for existing frameworks.
翻译:我们提出了一种新颖的架构——金字塔扩散模型(Pyramid Diffusion Model, PDM),专为超高分辨率图像合成而设计。PDM采用金字塔潜在表示,提供了更广阔的设计空间,使得感知压缩更加灵活、结构化且高效,从而让自编码器(AutoEncoder)和扩散网络(Network of Diffusion)能够配备分支和更深层次的模块。为了增强PDM在生成任务中的性能,我们引入了空间-通道注意力(Spatial-Channel Attention)和残差跳跃连接(Res-Skip Connection),并在扩散网络与自编码器中采用了谱归一化(Spectral Norm)和递减丢弃策略(Decreasing Dropout Strategy)。总之,PDM首次实现了2K分辨率图像的合成,并在两个新数据集上进行了验证,这些数据集分别包含2048×2048像素和2048×1024像素大小的图像。我们相信,这项工作为设计可扩展的图像生成模型提供了一种替代方案,同时也对现有框架进行了增量式强化。