Recent advances in text-to-image (T2I) diffusion models have facilitated creative and photorealistic image synthesis. By varying the random seeds, we can generate various images for a fixed text prompt. Technically, the seed controls the initial noise and, in multi-step diffusion inference, the noise used for reparameterization at intermediate timesteps in the reverse diffusion process. However, the specific impact of the random seed on the generated images remains relatively unexplored. In this work, we conduct a large-scale scientific study into the impact of random seeds during diffusion inference. Remarkably, we reveal that the best 'golden' seed achieved an impressive FID of 21.60, compared to the worst 'inferior' seed's FID of 31.97. Additionally, a classifier can predict the seed number used to generate an image with over 99.9% accuracy in just a few epochs, establishing that seeds are highly distinguishable based on generated images. Encouraged by these findings, we examined the influence of seeds on interpretable visual dimensions. We find that certain seeds consistently produce grayscale images, prominent sky regions, or image borders. Seeds also affect image composition, including object location, size, and depth. Moreover, by leveraging these 'golden' seeds, we demonstrate improved image generation such as high-fidelity inference and diversified sampling. Our investigation extends to inpainting tasks, where we uncover some seeds that tend to insert unwanted text artifacts. Overall, our extensive analyses highlight the importance of selecting good seeds and offer practical utility for image generation.
翻译:近年来,文本到图像(T2I)扩散模型的发展推动了富有创意且逼真的图像合成。通过改变随机种子,我们可以为固定的文本提示生成多样的图像。从技术上讲,种子控制着初始噪声,并且在多步扩散推理中,控制着反向扩散过程中间时间步重参数化所使用的噪声。然而,随机种子对生成图像的具体影响仍未得到充分探索。在本工作中,我们对扩散推理过程中随机种子的影响进行了大规模科学研究。值得注意的是,我们发现最佳的'黄金'种子实现了令人印象深刻的FID分数21.60,而最差的'劣质'种子的FID分数为31.97。此外,一个分类器仅需几个训练周期就能以超过99.9%的准确率预测用于生成图像的种子编号,这证实了基于生成的图像可以高度区分不同种子。受这些发现的鼓舞,我们研究了种子对可解释视觉维度的影响。我们发现某些种子会持续生成灰度图像、突出的天空区域或图像边框。种子还会影响图像构图,包括物体位置、大小和深度。此外,通过利用这些'黄金'种子,我们展示了改进的图像生成效果,例如高保真推理和多样化采样。我们的研究进一步扩展到修复任务,在其中我们发现某些种子倾向于插入不需要的文本伪影。总体而言,我们广泛的分析突显了选择优良种子的重要性,并为图像生成提供了实际应用价值。