We investigate the effect of automatically generated counter-stereotypes on gender bias held by users of various demographics on social media. Building on recent NLP advancements and social psychology literature, we evaluate two counter-stereotype strategies -- counter-facts and broadening universals (i.e., stating that anyone can have a trait regardless of group membership) -- which have been identified as the most potentially effective in previous studies. We assess the real-world impact of these strategies on mitigating gender bias across user demographics (gender and age), through the Implicit Association Test and the self-reported measures of explicit bias and perceived utility. Our findings reveal that actual effectiveness does not align with perceived effectiveness, and the former is a nuanced and sometimes divergent phenomenon across demographic groups. While overall bias reduction was limited, certain groups (e.g., older, male participants) exhibited measurable improvements in implicit bias in response to some interventions. Conversely, younger participants, especially women, showed increasing bias in response to the same interventions. These results highlight the complex and identity-sensitive nature of stereotype mitigation and call for dynamic and context-aware evaluation and mitigation strategies.
翻译:本研究探讨了社交媒体上自动生成的反刻板印象对不同人口统计学特征用户所持性别偏见的影响。基于近期自然语言处理进展与社会心理学文献,我们评估了两种反刻板印象策略——反事实陈述与普适性拓展(即声明任何群体成员均可拥有某种特质),这两种策略在先前研究中被确认为最具潜在效力。通过内隐联想测验以及外显偏见、感知效用的自我报告测量,我们评估了这些策略在现实世界中缓解不同用户人口特征(性别与年龄)性别偏见的效果。研究发现:实际效果与感知效果并不一致,且实际效果在不同人口群体间呈现微妙差异,有时甚至截然相反。尽管总体偏见减少有限,但特定群体(如年长男性参与者)在部分干预措施下表现出内隐偏见的可测量改善。相反,年轻参与者(尤其是女性)在相同干预下却显示出偏见加剧的现象。这些结果揭示了刻板印象缓解机制具有复杂性与身份敏感性,亟需建立动态且情境感知的评估与缓解策略。