Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation enhanced their sense of agency and empowerment through their role as guardians of the AI ecosystem. We discuss implications for designing participatory red-teaming that prioritizes both the ethical treatment and empowerment of stigmatized groups.
翻译:红队测试通过构建对抗性提示来暴露有害行为并评估风险,为揭示大型语言模型中潜在的刻板偏见提供了动态方法。由于此类隐性危害最易被具有切身经历者识别,让刻板印象的目标群体作为红队测试者至关重要。然而,在利用其生活经验进行红队测试的同时保障心理健康仍存在关键挑战。我们针对韩国20名因非名牌大学毕业生刻板印象而遭受污名化的个体开展了参与式红队测试实证研究。通过混合方法分析发现:参与者将亲身经历的歧视转化为识别偏见的策略性专长,但同时面临压力与群体身份负面反思等心理代价。值得注意的是,红队测试参与通过扮演AI生态系统守护者的角色,增强了参与者的能动性与赋权感。本文探讨了设计参与式红队测试的启示,强调应同时重视对被污名化群体的伦理对待与能力赋能。