当机器犯错时：大型语言模型比人类更易延续自闭症迷思 (When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do)

As Large Language Models become ubiquitous sources of health information, understanding their capacity to accurately represent stigmatized conditions is crucial for responsible deployment. This study examines whether leading AI systems perpetuate or challenge misconceptions about Autism Spectrum Disorder, a condition particularly vulnerable to harmful myths. We administered a 30-item instrument measuring autism knowledge to 178 participants and three state-of-the-art LLMs including GPT-4, Claude, and Gemini. Contrary to expectations that AI systems would leverage their vast training data to outperform humans, we found the opposite pattern: human participants endorsed significantly fewer myths than LLMs (36.2% vs. 44.8% error rate; z = -2.59, p = .0048). In 18 of the 30 evaluated items, humans significantly outperformed AI systems. These findings reveal a critical blind spot in current AI systems and have important implications for human-AI interaction design, the epistemology of machine knowledge, and the need to center neurodivergent perspectives in AI development.

翻译：随着大型语言模型成为普遍的健康信息来源，理解其准确表征污名化病症的能力对于负责任部署至关重要。本研究考察了领先的AI系统是延续还是挑战了关于自闭症谱系障碍的误解——该病症尤其容易受到有害迷思的影响。我们使用包含30个项目的自闭症知识测量工具，对178名参与者及三个最先进的大型语言模型（包括GPT-4、Claude和Gemini）进行了测试。与预期AI系统将利用其海量训练数据超越人类表现相反，我们发现了相反的模式：人类参与者认同的迷思显著少于大型语言模型（错误率36.2% vs. 44.8%；z = -2.59, p = .0048）。在评估的30个项目中有18项，人类表现显著优于AI系统。这些发现揭示了当前AI系统的关键盲区，并对人机交互设计、机器知识的认识论，以及在AI开发中需要以神经多样性视角为中心的需求具有重要启示。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

大语言模型机器遗忘综述

专知会员服务

18+阅读 · 2025年11月2日

可解释人工智能中的大语言模型：全面综述

专知会员服务

53+阅读 · 2025年4月2日

《人类与机器：兵棋推演中人类专家与大语言模型的行为差异》

专知会员服务

33+阅读 · 2024年10月16日

大型语言模型疾病诊断综述

专知会员服务

32+阅读 · 2024年9月21日