Hate speech in social media is a growing phenomenon, and detecting such toxic content has recently gained significant traction in the research community. Existing studies have explored fine-tuning language models (LMs) to perform hate speech detection, and these solutions have yielded significant performance. However, most of these studies are limited to detecting hate speech only in English, neglecting the bulk of hateful content that is generated in other languages, particularly in low-resource languages. Developing a classifier that captures hate speech and nuances in a low-resource language with limited data is extremely challenging. To fill the research gap, we propose HateMAML, a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages. HateMAML utilizes a self-supervision strategy to overcome the limitation of data scarcity and produces better LM initialization for fast adaptation to an unseen target language (i.e., cross-lingual transfer) or other hate speech datasets (i.e., domain generalization). Extensive experiments are conducted on five datasets across eight different low-resource languages. The results show that HateMAML outperforms the state-of-the-art baselines by more than 3% in the cross-domain multilingual transfer setting. We also conduct ablation studies to analyze the characteristics of HateMAML.
翻译:社交媒体中的仇恨言论现象日益严重,检测此类有害内容近期在学术界引起了广泛关注。现有研究探索了通过微调语言模型(LMs)来执行仇恨言论检测,这些方案已取得显著成效。然而,多数研究仅局限于检测英语仇恨言论,忽视了其他语言(尤其是低资源语言)中产生的大量有害内容。在数据有限的情况下开发能够捕捉低资源语言仇恨言论及其微妙之处的分类器极具挑战性。为填补这一研究空白,我们提出了HateMAML——一种基于模型无关元学习的框架,能够有效执行低资源语言的仇恨言论检测。HateMAML采用自监督策略来克服数据稀缺的局限,并通过更优的语言模型初始化实现对未见目标语言的快速适应(即跨语言迁移)或其他仇恨言论数据集的泛化(即领域泛化)。我们在涵盖八种不同低资源语言的五个数据集上开展了大量实验。结果表明,在跨领域多语言迁移场景下,HateMAML的性能超越现有最先进基线方法超过3%。此外,我们还通过消融实验分析了HateMAML的特性。