Hate speech poses a serious threat to social cohesion and individual well-being, particularly on social media, where it spreads rapidly. While research on hate speech detection has progressed, it remains largely focused on English, resulting in limited resources and benchmarks for low-resource languages. Moreover, many of these languages have multiple linguistic varieties, a factor often overlooked in current approaches. At the same time, large language models require substantial amounts of data to perform reliably, a requirement that low-resource languages often cannot meet. In this work, we address these gaps by compiling a meta-collection of hate speech datasets for European Spanish, standardised with unified labels and metadata. This collection is based on a systematic analysis and integration of existing resources, aiming to bridge the data gap and support more consistent and scalable hate speech detection. We extended this collection by translating it into European Portuguese and into a Galician standard that is more convergent with Spanish and another Galician variant that is more convergent with Portuguese, creating aligned multilingual corpora. Using these resources, we establish new benchmarks for hate speech detection in Iberian languages. We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, providing baseline results for future research. Moreover, we perform a cross-lingual analysis with our target languages. Our findings underscore the importance of multilingual and variety-aware approaches in hate speech detection and offer a foundation for improved benchmarking in underrepresented European languages.
翻译:仇恨言论对社会凝聚力与个人福祉构成严重威胁,在社交媒体上尤甚,因其传播迅速。尽管仇恨言论检测研究已取得进展,但现有工作仍主要集中于英语,导致低资源语言的相关资源与基准极为有限。此外,许多低资源语言包含多种语言变体,这一因素在当前研究方法中常被忽视。与此同时,大语言模型需要大量数据才能可靠运行,而低资源语言往往无法满足这一需求。本研究通过构建面向欧洲西班牙语的仇恨言论元数据集来应对这些不足,该数据集采用统一的标注体系与元数据标准。该集合基于对现有资源的系统分析与整合,旨在弥合数据鸿沟,支持更一致、可扩展的仇恨言论检测。我们将此数据集扩展至欧洲葡萄牙语,并创建了两种加利西亚语变体:一种更接近西班牙语,另一种更接近葡萄牙语,从而构建了对齐的多语言语料库。基于这些资源,我们为伊比利亚语言建立了新的仇恨言论检测基准。我们在零样本、少样本及微调设置下评估了前沿大语言模型,为未来研究提供了基线结果。此外,我们针对目标语言开展了跨语言分析。我们的研究结果凸显了多语言及变体感知方法在仇恨言论检测中的重要性,并为提升欧洲欠表征语言的基准测试奠定了坚实基础。