Text-to-image models are increasingly popular and impactful, yet concerns regarding their safety and fairness remain. This study investigates the ability of ten popular Stable Diffusion models to generate harmful images, including NSFW, violent, and personally sensitive material. We demonstrate that these models respond to harmful prompts by generating inappropriate content, which frequently displays troubling biases, such as the disproportionate portrayal of Black individuals in violent contexts. Our findings demonstrate a complete lack of any refusal behavior or safety measures in the models observed. We emphasize the importance of addressing this issue as image generation technologies continue to become more accessible and incorporated into everyday applications.
翻译:文本到图像模型日益普及并产生重要影响,但其安全性与公平性仍存隐忧。本研究调查了十种流行Stable Diffusion模型生成有害图像的能力,包括NSFW内容、暴力场景及个人敏感材料。我们证明这些模型会对有害提示词生成不当内容,且常表现出令人不安的偏见,例如在暴力场景中不成比例地描绘黑人形象。研究结果表明,所观察的模型完全缺乏拒绝行为或安全防护机制。随着图像生成技术日益普及并融入日常应用,我们强调解决此问题的重要性。