Although pretrained large language models (PLMs) have achieved state-of-the-art on many natural language processing (NLP) tasks, they lack an understanding of subtle expressions of implicit hate speech. Various attempts have been made to enhance the detection of implicit hate by augmenting external context or enforcing label separation via distance-based metrics. Combining these two approaches, we introduce FiADD, a novel Focused Inferential Adaptive Density Discrimination framework. FiADD enhances the PLM finetuning pipeline by bringing the surface form/meaning of an implicit hate speech closer to its implied form while increasing the inter-cluster distance among various labels. We test FiADD on three implicit hate datasets and observe significant improvement in the two-way and three-way hate classification tasks. We further experiment on the generalizability of FiADD on three other tasks, detecting sarcasm, irony, and stance, in which surface and implied forms differ, and observe similar performance improvements. Consequently, we analyze the generated latent space to understand its evolution under FiADD, which corroborates the advantage of employing FiADD for implicit hate speech detection.
翻译:尽管预训练大语言模型(PLM)已在众多自然语言处理(NLP)任务中达到最优性能,但其对隐式仇恨言论的微妙表达仍缺乏理解。现有研究尝试通过引入外部上下文增强或借助基于距离的度量强制标签分离来改进隐式仇恨检测。综合这两种思路,本文提出一种新颖的聚焦推理自适应密度判别框架——FiADD。该框架通过拉近隐式仇恨言论的表层形式/含义与其隐含形式之间的距离,同时增大不同标签簇间的类间距离,从而增强PLM的微调流程。我们在三个隐式仇恨数据集上测试FiADD,发现在二分类与三分类仇恨检测任务中均取得显著提升。进一步将FiADD推广至讽刺、反讽及立场检测这三个表层形式与隐含意义存在差异的任务,同样观察到性能的持续改进。最后,我们通过分析FiADD作用下潜在空间的演化过程,验证了该框架在隐式仇恨言论检测中的优势。