We propose an effective denoising diffusion model for generating high-resolution images (e.g., 1024$\times$512), trained on small-size image patches (e.g., 64$\times$64). We name our algorithm Patch-DM, in which a new feature collage strategy is designed to avoid the boundary artifact when synthesizing large-size images. Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space. Patch-DM produces high-quality image synthesis results on our newly collected dataset of nature images (1024$\times$512), as well as on standard benchmarks of smaller sizes (256$\times$256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare our method with previous patch-based generation methods and achieve state-of-the-art FID scores on all four datasets. Further, Patch-DM also reduces memory complexity compared to the classic diffusion models.
翻译:我们提出了一种有效的去噪扩散模型,用于生成高分辨率图像(例如1024×512),并在小尺寸图像补丁(例如64×64)上进行训练。我们将该算法命名为Patch-DM,其中设计了一种新的特征拼接策略,以避免在合成大尺寸图像时出现边界伪影。特征拼接系统性地裁剪并组合相邻补丁的部分特征,以预测移位图像补丁的特征,从而由于补丁特征空间中的重叠可实现整张图像的无缝生成。Patch-DM在我们新收集的自然图像数据集(1024×512)以及标准小尺寸基准(256×256),包括LSUN-Bedroom、LSUN-Church和FFHQ上均产生了高质量图像合成结果。我们将该方法与之前的基于补丁的生成方法进行比较,并在所有四个数据集上取得了最优FID分数。此外,与传统扩散模型相比,Patch-DM还降低了内存复杂度。