With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.
翻译:随着扩散模型的最新进展,用户可通过自然语言文本提示生成高质量图像。然而,生成具有所需细节的图像需要恰当的提示,而模型对不同提示的响应方式或最佳提示的选择往往不明确。为帮助研究者应对这些关键挑战,我们推出了DiffusionDB——首个大规模文本到图像提示数据集,总容量达6.5TB,包含由Stable Diffusion生成的1400万张图像、180万条独立提示及真实用户指定的超参数。我们分析了提示的句法与语义特征,定位了可能导致模型错误的特定超参数值与提示风格,并呈现了模型潜在有害使用(如生成虚假信息)的证据。这一由人类驱动数据集前所未有的规模与多样性,为理解提示与生成模型间的相互作用、检测深度伪造以及设计人机交互工具以帮助用户更便捷使用这些模型提供了激动人心的研究机遇。DiffusionDB现已公开:https://poloclub.github.io/diffusiondb。