Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation. To this end, we present GOTHate, a large-scale code-mixed crowdsourced dataset of around 51k posts for hate speech detection from Twitter. GOTHate is neutrally seeded, encompassing different languages and topics. We conduct detailed comparisons of GOTHate with the existing hate speech datasets, highlighting its novelty. We benchmark it with 10 recent baselines. Our extensive empirical and benchmarking experiments suggest that GOTHate is hard to classify in a text-only setup. Thus, we investigate how adding endogenous signals enhances the hate speech detection task. We augment GOTHate with the user's timeline information and ego network, bringing the overall data source closer to the real-world setup for understanding hateful content. Our proposed solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that enriches the linguistic subspace with latent endogenous signals from history, topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5% in overall macro-F1 and hate class F1, respectively. Inspired by our experiments, in partnership with Wipro AI, we are developing a semi-automated pipeline to detect hateful content as a part of their mission to tackle online harm.

翻译：社交媒体上充斥着仇恨内容，其中许多往往以语言和主题的多样性为掩饰。用于仇恨言论检测的基准数据集未考虑这种发散性，因为它们主要使用仇恨词汇进行编译。然而，在中性种子诱导的恶意内容中捕捉仇恨信号变得具有挑战性。因此，设计模拟现实世界中仇恨变异性模型和数据集值得进一步研究。为此，我们提出了GOTHate，这是一个包含约5.1万条推文的大规模代码混合众包数据集，用于仇恨言论检测。GOTHate采用中性种子诱导，涵盖不同语言和主题。我们将GOTHate与现有仇恨言论数据集进行详细比较，突出其新颖性。我们使用10个最近基线对其进行了基准测试。我们广泛的实证和基准实验表明，在纯文本设置下对GOTHate进行分类较为困难。因此，我们研究了添加内源信号如何增强仇恨言论检测任务。我们通过用户时间线信息和自我网络对GOTHate进行增强，使整体数据源更接近理解仇恨内容的现实世界设置。我们提出的解决方案HEN-mBERT是一个模块化、多语言的专家混合模型，用来自历史、拓扑和范例的潜在内源信号丰富语言子空间。HEN-mBERT在整体宏F1和仇恨类别F1上分别超越最佳基线2.5%和5%。受我们实验启发，我们正与Wipro AI合作开发半自动化管道，作为其应对在线伤害使命的一部分，用于检测仇恨内容。