Warning: This paper contains examples of the language that some people may find offensive. Detecting and reducing hateful, abusive, offensive comments is a critical and challenging task on social media. Moreover, few studies aim to mitigate the intensity of hate speech. While studies have shown that context-level semantics are crucial for detecting hateful comments, most of this research focuses on English due to the ample datasets available. In contrast, low-resource languages, like Indian languages, remain under-researched because of limited datasets. Contrary to hate speech detection, hate intensity reduction remains unexplored in high-resource and low-resource languages. In this paper, we propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. First, we fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model. Then, we identified the contextual hateful words. Identification of such hateful words is justified through the state-of-the-art explainable learning model, i.e., Integrated Gradient (IG). Lastly, the Masked Language Modeling (MLM) model has been employed to capture domain-specific nuances to reduce hate intensity. We masked the 50\% hateful words of the comments identified as hateful and predicted the alternative words for these masked terms to generate convincing sentences. An optimal replacement for the original hate comments from the feasible sentences is preferred. Extensive experiments have been conducted on several recent datasets using automatic metric-based evaluation (BERTScore) and thorough human evaluation. To enhance the faithfulness in human evaluation, we arranged a group of three human annotators with varied expertise.
翻译:警告:本文包含可能令部分读者不适的语言示例。在社交媒体上检测并减少仇恨性、辱骂性及攻击性评论是一项关键且具有挑战性的任务。此外,现有研究鲜少关注降低仇恨言论的强度。尽管已有研究表明上下文语义对检测仇恨评论至关重要,但受限于丰富数据集,此类研究主要集中于英语。相比之下,以印度语言为代表的低资源语言因数据集有限而研究不足。与仇恨言论检测不同,高资源语言与低资源语言中的仇恨强度降低研究仍是空白领域。本文提出了一种新颖的端到端模型HCDIR,用于社交媒体帖子的仇恨语境检测与仇恨强度降低。首先,我们微调了多个预训练语言模型以检测仇恨评论,从而确定性能最佳的仇恨评论检测模型;随后识别出上下文中的仇恨词语,并通过先进的解释性学习模型——即积分梯度(Integrated Gradient, IG)方法验证了此类词语识别的合理性;最后,利用掩码语言建模(Masked Language Modeling, MLM)捕捉领域特定细微差别以降低仇恨强度。我们对被识别为仇恨评论中50%的仇恨词语进行掩码,并预测这些掩码术语的替代词以生成具有说服力的句子。从可行句子中优先选择替代原始仇恨评论的最优方案。我们使用自动指标评估(BERTScore)和人工评估在多个最新数据集上进行了广泛实验。为提升人工评估的可靠性,我们组建了由三位不同专业背景标注员构成的评估小组。