As Large Language Models become ubiquitous sources of health information, understanding their capacity to accurately represent stigmatized conditions is crucial for responsible deployment. This study examines whether leading AI systems perpetuate or challenge misconceptions about Autism Spectrum Disorder, a condition particularly vulnerable to harmful myths. We administered a 30-item instrument measuring autism knowledge to 178 participants and three state-of-the-art LLMs including GPT-4, Claude, and Gemini. Contrary to expectations that AI systems would leverage their vast training data to outperform humans, we found the opposite pattern: human participants endorsed significantly fewer myths than LLMs (36.2% vs. 44.8% error rate; z = -2.59, p = .0048). In 18 of the 30 evaluated items, humans significantly outperformed AI systems. These findings reveal a critical blind spot in current AI systems and have important implications for human-AI interaction design, the epistemology of machine knowledge, and the need to center neurodivergent perspectives in AI development.
翻译:随着大型语言模型成为普遍的健康信息来源,理解其准确表征污名化病症的能力对于负责任部署至关重要。本研究考察了领先的AI系统是延续还是挑战了关于自闭症谱系障碍的误解——该病症尤其容易受到有害迷思的影响。我们使用包含30个项目的自闭症知识测量工具,对178名参与者及三个最先进的大型语言模型(包括GPT-4、Claude和Gemini)进行了测试。与预期AI系统将利用其海量训练数据超越人类表现相反,我们发现了相反的模式:人类参与者认同的迷思显著少于大型语言模型(错误率36.2% vs. 44.8%;z = -2.59, p = .0048)。在评估的30个项目中有18项,人类表现显著优于AI系统。这些发现揭示了当前AI系统的关键盲区,并对人机交互设计、机器知识的认识论,以及在AI开发中需要以神经多样性视角为中心的需求具有重要启示。