With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.
翻译:随着扩散模型的最新进展,用户可通过用自然语言编写文本提示来生成高质量图像。然而,生成具有所需细节的图像需要恰当的提示,且通常难以明确模型如何对不同提示做出反应,或何种提示为最优。为帮助研究人员应对这些关键挑战,我们推出了DiffusionDB——首个大规模文本到图像提示数据集,总计6.5TB,包含由Stable Diffusion生成的1400万张图像、180万个独特提示以及实际用户指定的超参数。我们分析了提示的句法和语义特征,精确定位了可能导致模型错误的特定超参数值与提示风格,并展示了潜在有害模型使用的证据(如生成虚假信息)。这一人工驱动数据集前所未有的规模与多样性,为理解提示与生成模型间的相互作用、检测深度伪造、以及设计人机交互工具以帮助用户更轻松使用这些模型提供了激动人心的研究机会。DiffusionDB现已公开:https://poloclub.github.io/diffusiondb