As Large Language Models become ubiquitous sources of health information, understanding their capacity to accurately represent stigmatized conditions is crucial for responsible deployment. This study examines whether leading AI systems perpetuate or challenge misconceptions about Autism Spectrum Disorder, a condition particularly vulnerable to harmful myths. We administered a 30-item instrument measuring autism knowledge to 178 participants and three state-of-the-art LLMs including GPT-4, Claude, and Gemini. Contrary to expectations that AI systems would leverage their vast training data to outperform humans, we found the opposite pattern: human participants endorsed significantly fewer myths than LLMs (36.2% vs. 44.8% error rate; z = -2.59, p = .0048). In 18 of the 30 evaluated items, humans significantly outperformed AI systems. These findings reveal a critical blind spot in current AI systems and have important implications for human-AI interaction design, the epistemology of machine knowledge, and the need to center neurodivergent perspectives in AI development.
翻译:随着大型语言模型成为普遍的健康信息来源,理解其准确表征污名化病症的能力对于负责任部署至关重要。本研究考察了主流人工智能系统是会延续还是挑战关于自闭症谱系障碍的误解——这是一种特别容易受到有害误解影响的病症。我们采用包含30个项目的自闭症知识测量工具,对178名参与者及三种最先进的大型语言模型(包括GPT-4、Claude和Gemini)进行了测试。与预期人工智能系统将利用其海量训练数据超越人类表现的假设相反,我们发现了相反的模式:人类参与者认可的错误观念显著少于大型语言模型(错误率36.2% vs. 44.8%;z = -2.59, p = .0048)。在30个评估项目中,有18个项目人类表现显著优于人工智能系统。这些发现揭示了当前人工智能系统的关键盲区,并对人机交互设计、机器知识的认识论,以及在人工智能开发中需要以神经多样性视角为中心的需求具有重要启示。