People with disabilities (PwD) regularly encounter ableist hate and microaggressions online. These spaces are generally moderated by machine learning models, but little is known about how effectively AI models identify ableist speech and how well their judgments align with PwD. To investigate this, we curated a first-of-its-kind dataset of 200 social media comments targeted towards PwD, and prompted state-of-the art AI models (i.e., Toxicity Classifiers, LLMs) to score toxicity and ableism for each comment, and explain their reasoning. Then, we recruited 190 participants to similarly rate and explain the harm, and evaluate LLM explanations. Our mixed-methods analysis highlighted a major disconnect: AI underestimated toxicity compared to PwD ratings, while its ableism assessments were sporadic and varied. Although LLMs identified some biases, its explanations were flawed--they lacked nuance, made incorrect assumptions, and appeared judgmental instead of educational. Going forward, we discuss challenges and opportunities in designing moderation systems for ableism, and advocate for the involvement of intersectional disabled perspectives in AI.
翻译:残障人士在网络空间中时常遭遇针对健全中心主义的仇恨言论与微歧视。这些平台通常由机器学习模型进行内容审核,但人们对于AI模型识别健全中心主义言论的有效性及其判断与残障人士认知的契合度知之甚少。为探究此问题,我们构建了首个包含200条针对残障人士社交媒体评论的数据集,并指令最先进的AI模型(即毒性分类器、大语言模型)对每条评论进行毒性与健全中心主义评分,同时要求其提供推理依据。随后,我们招募了190名参与者以类似方式对评论的危害性进行评分与解释,并评估大语言模型生成的解释。我们的混合方法分析揭示了一个显著脱节:相较于残障人士的评分,AI系统普遍低估了评论的毒性,而其健全中心主义评估则呈现零散且不一致的状态。尽管大语言模型能识别部分偏见,但其解释存在缺陷——缺乏细致辨析、做出错误假设,且呈现出评判性而非教育性的倾向。展望未来,我们探讨了设计针对健全中心主义内容审核系统所面临的挑战与机遇,并倡导在AI开发中纳入交叉性残障视角的参与。