Text-to-image generative models have made remarkable progress in producing high-quality visual content from textual descriptions, yet concerns remain about how they represent social groups. While characteristics like gender and race have received increasing attention, disability representations remain underexplored. This study investigates how people with disabilities are represented in AI-generated images by analyzing outputs from Stable Diffusion XL and DALL-E 3 using a structured prompt design. We analyze disability representations by comparing image similarities between generic disability prompts and prompts referring to specific disability categories. Moreover, we evaluate how mitigation strategies influence disability portrayals, with a focus on assessing affective framing through sentiment polarity analysis, combining both automatic and human evaluation. Our findings reveal persistent representational imbalances and highlight the need for continuous evaluation and refinement of generative models to foster more diverse and inclusive portrayals of disability.
翻译:文本到图像生成模型在根据文本描述生成高质量视觉内容方面取得了显著进展,但人们仍关注其如何表征社会群体。尽管性别和种族等特征日益受到关注,残障表征的研究仍显不足。本研究通过结构化提示设计分析 Stable Diffusion XL 和 DALL-E 3 的输出,探究人工智能生成图像如何表征残障人士。我们通过比较通用残障提示与特定残障类别提示生成的图像相似度来分析残障表征。此外,我们评估了缓解策略如何影响残障描绘,重点结合自动与人工评估,通过情感极性分析来评估情感框架。研究结果揭示了持续存在的表征失衡问题,并强调需要对生成模型进行持续评估和改进,以促进更具多样性和包容性的残障描绘。