语言、种姓与语境：人工智能生成解释在印度与美国STEM教育系统中的群体差异 (Language, Caste, and Context: Demographic Disparities in AI-Generated Explanations Across Indian and American STEM Educational Systems)

The popularization of AI chatbot usage globally has created opportunities for research into their benefits and drawbacks, especially for students using AI assistants for coursework support. This paper asks: how do LLMs perceive the intellectual capabilities of student profiles from intersecting marginalized identities across different cultural contexts? We conduct one of the first large-scale intersectional analyses on LLM explanation quality for Indian and American undergraduate profiles preparing for engineering entrance examinations. By constructing profiles combining multiple demographic dimensions including caste, medium of instruction, and school boards in India, and race, HBCU attendance, and school type in America, alongside universal factors like income and college tier, we examine how quality varies across these factors. We observe biases providing lower-quality outputs to profiles with marginalized backgrounds in both contexts. LLMs such as Qwen2.5-32B-Instruct and GPT-4o demonstrate granular understandings of context-specific discrimination, systematically providing simpler explanations to Hindi/Regional-medium students in India and HBCU profiles in America, treating these as proxies for lower capability. Even when marginalized profiles attain social mobility by getting accepted into elite institutions, they still receive more simplistic explanations, showing how demographic information is inextricably linked to LLM biases. Different models (Qwen2.5-32B-Instruct, GPT-4o, GPT-4o-mini, GPT-OSS 20B) embed similar biases against historically marginalized populations in both contexts, preventing profiles from switching between AI assistants for better results. Our findings have strong implications for AI incorporation into global engineering education.

翻译：人工智能聊天机器人在全球范围内的普及，为研究其利弊创造了机遇，特别是对于使用AI助手辅助课程学习的学生而言。本文探讨：大型语言模型（LLM）如何感知不同文化背景下具有交叉边缘身份的学生画像的智力能力？我们对LLM为准备工程入学考试的印度和美国本科生画像提供的解释质量，进行了首次大规模交叉分析之一。通过构建结合多种人口统计维度的画像——在印度包括种姓、教学语言和学校委员会，在美国包括种族、是否就读于传统黑人大学（HBCU）和学校类型，以及收入、大学层级等普遍因素——我们考察了解释质量如何随这些因素变化。我们观察到，在这两种情境下，模型都对具有边缘背景的画像产生了偏见，提供了质量较低的输出。诸如Qwen2.5-32B-Instruct和GPT-4o等LLM表现出对特定语境歧视的细致理解，系统性地为印度的印地语/地区语言教学学生以及美国的HBCU画像提供更简单的解释，将这些特征视为较低能力的代理。即使边缘画像通过被精英院校录取实现了社会流动，它们仍然收到更简化的解释，这表明人口统计信息与LLM偏见密不可分。不同模型（Qwen2.5-32B-Instruct、GPT-4o、GPT-4o-mini、GPT-OSS 20B）在这两种情境下都嵌入了对历史上边缘群体的相似偏见，阻碍了用户通过切换AI助手来获得更好结果。我们的发现对人工智能融入全球工程教育具有重要启示。