This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We offer a taxonomy of risks across six key stakeholder groups, inclusive of unexplored issues, and suggest future research directions. We identify 22 distinct risk types, spanning issues from data bias to malicious use. The investigation presented here is intended to enhance the ongoing discourse on responsible model development and deployment. By highlighting previously overlooked risks and gaps, it aims to shape subsequent research and governance initiatives, guiding them toward the responsible, secure, and ethically conscious evolution of text-to-image models.
翻译:本文通过系统性文献综述,研究了现代文本生成图像模型(如DALL-E和Midjourney)相关的直接风险与危害。尽管这些模型在图像生成方面展现出前所未有的能力,但其开发与应用也引入了需审慎对待的新型风险。本综述发现,尽管部分风险已得到关注,但在风险的认知与应对方面仍存在显著知识空白。我们提出了涵盖六大关键利益相关群体的风险分类体系,包含尚未被充分探讨的问题,并建议了未来研究方向。研究识别出22种不同风险类型,涵盖从数据偏见到恶意使用等问题。本研究旨在促进关于负责任模型开发与部署的现有讨论,通过揭示先前被忽视的风险与不足,为后续研究及治理举措提供指引,推动文本生成图像模型向负责任、安全且符合伦理的方向发展。