Avoiding systemic discrimination of neurodiverse individuals is an ongoing challenge in training AI models, which often propagate negative stereotypes. This study examined whether six text-to-image models (Janus-Pro-7B VL2 vs. VL3, DALL-E 3 v. April 2024 vs. August 2025, Stable Diffusion v. 1.6 vs. 3.5, SDXL v. April 2024 vs. FLUX.1 Pro, and Midjourney v. 5.1 vs. 7) perpetuate non-rational beliefs regarding autism by comparing images generated in 2024-2025 with controls. 53 prompts aimed at neutrally visualizing concrete objects and abstract concepts related to autism were used against 53 controls (baseline total N=302, follow-up experimental 280 images plus 265 controls). Expert assessment measuring the presence of common autism-related stereotypes employed a framework of 10 deductive codes followed by statistical analysis. Autistic individuals were depicted with striking homogeneity in skin color (white), gender (male), and age (young), often engaged in solitary activities, interacting with objects rather than people, and exhibiting stereotypical emotional expressions such as sadness, anger, or emotional flatness. In contrast, the images of neurotypical individuals were more diverse and lacked such traits. We found significant differences between the models; however, with a moderate effect size, and no differences between baseline and follow-up summary values, with the ratio of stereotypical themes to the number of images similar across all models. The control prompts showed a significantly lower degree of stereotyping with large size effects, confirming the hidden biases of the models. In summary, despite improvements in the technical aspects of image generation, the level of reproduction of potentially harmful autism-related stereotypes remained largely unaffected.
翻译:避免对神经多样性个体的系统性歧视是训练AI模型时持续面临的挑战,这些模型常常传播负面刻板印象。本研究通过比较2024-2025年生成的图像与对照组,检验了六种文生图模型(Janus-Pro-7B VL2对比VL3、DALL-E 3 2024年4月版对比2025年8月版、Stable Diffusion 1.6版对比3.5版、SDXL 2024年4月版对比FLUX.1 Pro,以及Midjourney 5.1版对比7版)是否会延续关于自闭症的非理性观念。研究使用53个旨在中性呈现与自闭症相关的具体物体和抽象概念的提示词,并设置53个对照组(基线总N=302,后续实验包含280张图像加265个对照)。专家评估采用包含10个演绎编码的框架,测量常见自闭症相关刻板印象的存在情况,随后进行统计分析。自闭症个体的描绘在肤色(白色)、性别(男性)和年龄(年轻)方面表现出惊人的同质性,常从事单独活动、与物体而非人互动,并表现出悲伤、愤怒或情感平淡等刻板情绪表达。相比之下,神经典型个体的图像则更具多样性且缺乏此类特征。我们发现不同模型之间存在显著差异,但效应量中等;基线值与后续汇总值之间无差异,且所有模型中刻板主题与图像数量的比例相似。对照提示词显示的刻板化程度显著较低且效应量大,证实了模型存在隐性偏见。总之,尽管图像生成的技术层面有所改进,但对潜在有害的自闭症相关刻板印象的再现程度基本未受影响。