Warning: this paper contains content that may be offensive or upsetting. Most hate speech datasets neglect the cultural diversity within a single language, resulting in a critical shortcoming in hate speech detection. To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset. To construct CREHate, we follow a two-step procedure: 1) cultural post collection and 2) cross-cultural annotation. We sample posts from the SBIC dataset, which predominantly represents North America, and collect posts from four geographically diverse English-speaking countries (Australia, United Kingdom, Singapore, and South Africa) using culturally hateful keywords we retrieve from our survey. Annotations are collected from the four countries plus the United States to establish representative labels for each country. Our analysis highlights statistically significant disparities across countries in hate speech annotations. Only 56.2% of the posts in CREHate achieve consensus among all countries, with the highest pairwise label difference rate of 26%. Qualitative analysis shows that label disagreement occurs mostly due to different interpretations of sarcasm and the personal bias of annotators on divisive topics. Lastly, we evaluate large language models (LLMs) under a zero-shot setting and show that current LLMs tend to show higher accuracies on Anglosphere country labels in CREHate. Our dataset and codes are available at: https://github.com/nlee0212/CREHate
翻译:警告:本文包含可能引发不适或冒犯的内容。现有仇恨言论数据集大多忽视单一语言内部的文化多样性,导致仇恨言论检测存在严重缺陷。为此,我们提出跨文化英语仇恨言论数据集CREHate。构建CREHate采用两步法:1)文化性帖子收集 2)跨文化标注。我们从主要代表北美的SBIC数据集中采样帖子,并使用从问卷调查中获取的文化性仇恨关键词,从四个地理分布不同的英语国家(澳大利亚、英国、新加坡、南非)收集帖子。标注工作由这四个国家及美国共同完成,为每个国家建立代表性标签。分析表明各国仇恨言论标注存在显著统计学差异,CREHate中仅56.2%的帖子获得所有国家共识,最高标签差异率达26%。定性分析显示,标签分歧主要源于对讽刺性表达的不同解读以及标注者在对立话题上的个人偏见。最后,我们在零样本设置下评估大语言模型(LLMs),发现当前LLMs对CREHate中英语圈国家标签的准确率普遍较高。数据集与代码访问地址:https://github.com/nlee0212/CREHate