Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pixels in the initial latent images have a preference for generating specific content, and that modifying these blocks can significantly influence the generated image. In particular, we show that modifying a part of the initial image affects the corresponding region of the generated image while leaving other regions unaffected, which is useful for repainting tasks. Furthermore, we find that the generation preferences of pixel blocks are primarily determined by their values, rather than their position. By moving pixel blocks with a tendency to generate user-desired content to user-specified regions, our approach achieves state-of-the-art performance in layout-to-image generation. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
翻译:扩散模型具有通过去噪纯高斯噪声图像生成高质量图像的能力。以往研究主要集中于通过调节去噪过程来提升图像生成的控制能力,而本文提出了一种通过操控初始噪声来控制生成图像的新方向。通过在稳定扩散模型上的实验,我们发现初始潜在图像中的像素块对特定内容的生成具有偏好性,修改这些像素块可显著影响生成图像。特别地,修改初始图像的部分区域只会影响生成图像中的对应区域,而其他区域保持不变,这一特性对图像重绘任务尤为有用。此外,我们发现像素块的生成偏好主要由其数值决定,而非位置。通过将具有生成用户期望内容倾向的像素块迁移至用户指定区域,我们的方法在布局到图像生成任务中达到了最先进水平。研究结果凸显了初始图像操控在生成图像控制中的灵活性与强大能力。