Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation. To this end, we present GOTHate, a large-scale code-mixed crowdsourced dataset of around 51k posts for hate speech detection from Twitter. GOTHate is neutrally seeded, encompassing different languages and topics. We conduct detailed comparisons of GOTHate with the existing hate speech datasets, highlighting its novelty. We benchmark it with 10 recent baselines. Our extensive empirical and benchmarking experiments suggest that GOTHate is hard to classify in a text-only setup. Thus, we investigate how adding endogenous signals enhances the hate speech detection task. We augment GOTHate with the user's timeline information and ego network, bringing the overall data source closer to the real-world setup for understanding hateful content. Our proposed solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that enriches the linguistic subspace with latent endogenous signals from history, topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5% in overall macro-F1 and hate class F1, respectively. Inspired by our experiments, in partnership with Wipro AI, we are developing a semi-automated pipeline to detect hateful content as a part of their mission to tackle online harm.

翻译：社交媒体上充斥着仇恨内容，其中许多往往以语言和话题多样性为伪装。用于仇恨言论检测的基准数据集并未考虑此类偏差，因为它们主要基于仇恨词汇库进行汇编。然而，在中性植入的恶意内容中捕捉仇恨信号变得极具挑战性。因此，设计能够模仿现实世界中仇恨内容可变性的模型和数据集值得进一步研究。为此，我们提出了GOTHate——一个从Twitter收集、包含约5.1万条帖子的大规模代码混合众包数据集，用于仇恨言论检测。GOTHate采用中性植入方式，涵盖多种语言和话题。我们将GOTHate与现有仇恨言论数据集进行详细比较，突出了其新颖性。我们使用10个最新基线模型对其进行了基准测试。大量实证和基准实验表明，在纯文本设置下，GOTHate难以分类。因此，我们研究了添加内源性信号如何增强仇恨言论检测任务。我们通过用户时间线信息和自我网络对GOTHate进行增强，使整体数据源更接近真实世界中理解仇恨内容的场景。我们提出的解决方案HEN-mBERT是一种模块化、多语言、混合专家模型，它通过来自历史、拓扑和范例的潜在内源性信号来丰富语言子空间。HEN-mBERT在整体宏F1值和仇恨类别F1值上分别比最佳基线模型高出2.5%和5%。受实验启发，我们与Wipro AI合作，正在开发一个半自动化管道来检测仇恨内容，作为其应对在线伤害使命的一部分。