Automatic hate speech detection is an important yet complex task, requiring knowledge of common sense, stereotypes of protected groups, and histories of discrimination, each of which may constantly evolve. In this paper, we propose a group-specific approach to NLP for online hate speech detection. The approach consists of creating and infusing historical and linguistic knowledge about a particular protected group into hate speech detection models, analyzing historical data about discrimination against a protected group to better predict spikes in hate speech against that group, and critically evaluating hate speech detection models through lenses of intersectionality and ethics. We demonstrate this approach through a case study on NLP for detection of antisemitic hate speech. The case study synthesizes the current English-language literature on NLP for antisemitism detection, introduces a novel knowledge graph of antisemitic history and language from the 20th century to the present, infuses information from the knowledge graph into a set of tweets over Logistic Regression and uncased DistilBERT baselines, and suggests that incorporating context from the knowledge graph can help models pick up subtle stereotypes.
翻译:自动仇恨言论检测是一项重要但复杂的任务,需要掌握常识、受保护群体的刻板印象以及歧视历史背景,这些要素均可能持续演变。本文提出了一种针对网络仇恨言论检测的自然语言处理群体特定方法。该方法具体包括:创建特定受保护群体的历史与语言知识并将其注入仇恨言论检测模型;分析针对受保护群体的历史歧视数据,从而更精准预测针对该群体的仇恨言论爆发点;通过交叉性与伦理视角批判性评估仇恨言论检测模型。我们通过反犹太仇恨言论检测的自然语言处理案例研究展示了该方法。该案例研究综合了当前关于反犹太主义检测的自然语言处理英文文献,引入了涵盖20世纪至今反犹太历史与语言的新型知识图谱,将该知识图谱信息注入基于Logistic回归与非大小写敏感DistilBERT基线的推文数据集,并表明融入知识图谱的上下文信息有助于模型捕捉微妙的刻板印象。