Denoising Diffusion Probabilistic Models (DDPMs) exhibit remarkable capabilities in image generation, with studies suggesting that they can generalize by composing latent factors learned from the training data. In this work, we go further and study DDPMs trained on strictly separate subsets of the data distribution with large gaps on the support of the latent factors. We show that such a model can effectively generate images in the unexplored, intermediate regions of the distribution. For instance, when trained on clearly smiling and non-smiling faces, we demonstrate a sampling procedure which can generate slightly smiling faces without reference images (zero-shot interpolation). We replicate these findings for other attributes as well as other datasets. Our code is available at https://github.com/jdeschena/ddpm-zero-shot-interpolation.
翻译:去噪扩散概率模型(DDPM)在图像生成方面展现出卓越能力,已有研究表明其能够通过组合从训练数据中学到的潜在因子实现泛化。本研究进一步探索了在数据分布严格分离的子集上训练的DDPM——这些子集在潜在因子支撑集上存在显著间隔。我们证明此类模型能够有效生成分布中未经探索的中间区域图像。例如,当模型仅在明显微笑与非微笑人脸数据上训练时,我们提出一种无需参考图像(零样本插值)即可生成微含笑意的采样方法。我们在其他属性及数据集上也复现了这些发现。代码已开源:https://github.com/jdeschena/ddpm-zero-shot-interpolation。