Warning: this paper contains content that may be offensive or upsetting Hate speech moderation on global platforms poses unique challenges due to the multimodal and multilingual nature of content, along with the varying cultural perceptions. How well do current vision-language models (VLMs) navigate these nuances? To investigate this, we create the first multimodal and multilingual parallel hate speech dataset, annotated by a multicultural set of annotators, called Multi3Hate. It contains 300 parallel meme samples across 5 languages: English, German, Spanish, Hindi, and Mandarin. We demonstrate that cultural background significantly affects multimodal hate speech annotation in our dataset. The average pairwise agreement among countries is just 74%, significantly lower than that of randomly selected annotator groups. Our qualitative analysis indicates that the lowest pairwise label agreement-only 67% between the USA and India-can be attributed to cultural factors. We then conduct experiments with 5 large VLMs in a zero-shot setting, finding that these models align more closely with annotations from the US than with those from other cultures, even when the memes and prompts are presented in the dominant language of the other culture. Code and dataset are available at https://github.com/MinhDucBui/Multi3Hate.
翻译:警告:本文包含可能具有冒犯性或令人不适的内容。全球平台上的仇恨言论审核因内容的多模态、多语言特性以及文化认知差异而面临独特挑战。当前的视觉-语言模型(VLMs)能否有效应对这些细微差别?为探究此问题,我们构建了首个由多文化背景标注者标注的多模态、多语言平行仇恨言论数据集,命名为Multi3Hate。该数据集包含英语、德语、西班牙语、印地语和中文5种语言共300个平行迷因样本。我们证明,文化背景对本数据集中的多模态仇恨言论标注具有显著影响。不同国家标注者之间的平均两两标注一致性仅为74%,显著低于随机选取的标注者组别。定性分析表明,美国与印度标注者之间最低的两两标签一致性(仅67%)可归因于文化因素。随后,我们在零样本设置下对5个大型VLM进行实验,发现即使当迷因和提示以其他文化的主导语言呈现时,这些模型仍更接近美国标注者的标注结果,而非其他文化的标注。代码与数据集发布于 https://github.com/MinhDucBui/Multi3Hate。