A growing body of work has focused on text classification methods for detecting the increasing amount of hate speech posted online. This progress has been limited to only a select number of highly-resourced languages causing detection systems to either under-perform or not exist in limited data contexts. This is majorly caused by a lack of training data which is expensive to collect and curate in these settings. In this work, we propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts using synthetic data generation techniques. Given a handful of hate speech examples in a high-resource language such as English, we present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets. We apply our approach to generate training data for hate speech classification tasks in Hindi and Vietnamese. Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain. This method can be adopted to bootstrap hate speech detection models from scratch in limited data contexts. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response to hate speech.
翻译:随着在线仇恨言论日益增多,大量研究聚焦于文本分类方法进行检测。然而,这些进展仅限于少数高资源语言,导致检测系统在低资源语境下性能不足或完全缺失。主要原因是训练数据匮乏——在这些场景中,数据的收集与标注成本高昂。本文提出一种数据增强方法,通过合成数据生成技术解决低资源环境下在线仇恨言论检测的数据短缺问题。基于高资源语言(如英语)中的少量仇恨言论样本,我们提出三种方法合成目标语言的新仇恨言论数据:保留原始样本中的仇恨情感,但转移仇恨目标。我们将该方法应用于印地语和越南语的仇恨言论分类任务,生成训练数据。实验结果表明,基于合成数据训练的模型性能可与仅使用目标领域可用样本训练的模型相媲美,甚至在某些情况下更优。该方法可用于从零开始构建低资源场景下的仇恨言论检测模型。随着这些语境中社交媒体的增长持续超越响应能力,本研究进一步提升了我们对仇恨言论的检测、理解与响应能力。