Digital dehumanization, although a critical issue, remains largely overlooked within the field of computational linguistics and Natural Language Processing. The prevailing approach in current research concentrating primarily on a single aspect of dehumanization that identifies overtly negative statements as its core marker. This focus, while crucial for understanding harmful online communications, inadequately addresses the broader spectrum of dehumanization. Specifically, it overlooks the subtler forms of dehumanization that, despite not being overtly offensive, still perpetuate harmful biases against marginalized groups in online interactions. These subtler forms can insidiously reinforce negative stereotypes and biases without explicit offensiveness, making them harder to detect yet equally damaging. Recognizing this gap, we use different sampling methods to collect a theory-informed bilingual dataset from Twitter and Reddit. Using crowdworkers and experts to annotate 16,000 instances on a document- and span-level, we show that our dataset covers the different dimensions of dehumanization. This dataset serves as both a training resource for machine learning models and a benchmark for evaluating future dehumanization detection techniques. To demonstrate its effectiveness, we fine-tune ML models on this dataset, achieving performance that surpasses state-of-the-art models in zero and few-shot in-context settings.
翻译:数字非人化虽是一个关键问题,但在计算语言学和自然语言处理领域仍很大程度上被忽视。当前研究的主流方法主要集中于非人化的单一维度,将公开的负面陈述作为其核心标识。这种关注点虽然对理解有害的在线交流至关重要,却未能充分涵盖非人化现象的广泛谱系。具体而言,它忽略了那些更为隐蔽的非人化形式——这些形式虽不具公开冒犯性,却仍在网络互动中持续强化针对边缘群体的有害偏见。这些隐蔽形式可能在不显露明显攻击性的情况下,潜移默化地巩固负面刻板印象和偏见,使其更难以检测却具有同等的危害性。为填补这一空白,我们采用多种抽样方法从Twitter和Reddit平台收集了一个基于理论指导的双语数据集。通过众包工作者和专家对16,000条实例进行文档级和片段级标注,我们证明该数据集涵盖了非人化的不同维度。该数据集既可作为机器学习模型的训练资源,也可作为评估未来非人化检测技术的基准。为验证其有效性,我们基于该数据集对机器学习模型进行微调,在零样本和少样本上下文学习场景中取得了超越现有最优模型的性能表现。