HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments

Warning: This paper contains examples of the language that some people may find offensive. Detecting and reducing hateful, abusive, offensive comments is a critical and challenging task on social media. Moreover, few studies aim to mitigate the intensity of hate speech. While studies have shown that context-level semantics are crucial for detecting hateful comments, most of this research focuses on English due to the ample datasets available. In contrast, low-resource languages, like Indian languages, remain under-researched because of limited datasets. Contrary to hate speech detection, hate intensity reduction remains unexplored in high-resource and low-resource languages. In this paper, we propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. First, we fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model. Then, we identified the contextual hateful words. Identification of such hateful words is justified through the state-of-the-art explainable learning model, i.e., Integrated Gradient (IG). Lastly, the Masked Language Modeling (MLM) model has been employed to capture domain-specific nuances to reduce hate intensity. We masked the 50\% hateful words of the comments identified as hateful and predicted the alternative words for these masked terms to generate convincing sentences. An optimal replacement for the original hate comments from the feasible sentences is preferred. Extensive experiments have been conducted on several recent datasets using automatic metric-based evaluation (BERTScore) and thorough human evaluation. To enhance the faithfulness in human evaluation, we arranged a group of three human annotators with varied expertise.

翻译：警告：本文包含可能令部分读者不适的语言示例。在社交媒体上检测并减少仇恨性、辱骂性及攻击性评论是一项关键且具有挑战性的任务。此外，现有研究鲜少关注降低仇恨言论的强度。尽管已有研究表明上下文语义对检测仇恨评论至关重要，但受限于丰富数据集，此类研究主要集中于英语。相比之下，以印度语言为代表的低资源语言因数据集有限而研究不足。与仇恨言论检测不同，高资源语言与低资源语言中的仇恨强度降低研究仍是空白领域。本文提出了一种新颖的端到端模型HCDIR，用于社交媒体帖子的仇恨语境检测与仇恨强度降低。首先，我们微调了多个预训练语言模型以检测仇恨评论，从而确定性能最佳的仇恨评论检测模型；随后识别出上下文中的仇恨词语，并通过先进的解释性学习模型——即积分梯度（Integrated Gradient, IG）方法验证了此类词语识别的合理性；最后，利用掩码语言建模（Masked Language Modeling, MLM）捕捉领域特定细微差别以降低仇恨强度。我们对被识别为仇恨评论中50%的仇恨词语进行掩码，并预测这些掩码术语的替代词以生成具有说服力的句子。从可行句子中优先选择替代原始仇恨评论的最优方案。我们使用自动指标评估（BERTScore）和人工评估在多个最新数据集上进行了广泛实验。为提升人工评估的可靠性，我们组建了由三位不同专业背景标注员构成的评估小组。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日