Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, thereby leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti-stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed StereoDetect, a well curated, definition-aligned benchmark dataset designed for this task. We show that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect's effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them. The dataset and code is available at https://github.com/KaustubhShejole/StereoDetect.
翻译:刻板印象已知具有严重危害,因此其检测至关重要。然而,当前研究主要集中于检测和评估刻板印象偏差,使得该领域研究仍处于早期阶段。我们的研究发现,许多工作未能清晰区分刻板印象与刻板印象偏差,这显著阻碍了该领域研究的进展。刻板印象与反刻板印象检测是一个需要社会知识的问题,因此是负责任人工智能中最具挑战性的领域之一。本研究探讨了这一任务,提出了一个五元组定义,并提供了精确的术语体系以区分刻板印象、反刻板印象、刻板印象偏差和一般性偏差。我们构建了一个基于社会心理学的概念框架以实现可靠检测。我们指出了现有刻板印象与反刻板印象检测基准数据集的关键缺陷。为弥补这些不足,我们开发了StereoDetect——一个精心构建、符合定义且专为此任务设计的基准数据集。研究表明,参数量低于100亿的语言模型和GPT-4o经常错误分类反刻板印象,且无法识别中性过度泛化现象。我们通过与现有基准数据集及基于其微调的模型进行多维度定性与定量比较,证明了StereoDetect的有效性。数据集与代码已公开于https://github.com/KaustubhShejole/StereoDetect。