Although pre-trained large language models (PLMs) have achieved state-of-the-art on many NLP tasks, they lack understanding of subtle expressions of implicit hate speech. Such nuanced and implicit hate is often misclassified as non-hate. Various attempts have been made to enhance the detection of (implicit) hate content by augmenting external context or enforcing label separation via distance-based metrics. We combine these two approaches and introduce FiADD, a novel Focused Inferential Adaptive Density Discrimination framework. FiADD enhances the PLM finetuning pipeline by bringing the surface form of an implicit hate speech closer to its implied form while increasing the inter-cluster distance among various class labels. We test FiADD on three implicit hate datasets and observe significant improvement in the two-way and three-way hate classification tasks. We further experiment on the generalizability of FiADD on three other tasks, namely detecting sarcasm, irony, and stance, in which surface and implied forms differ, and observe similar performance improvement. We analyze the generated latent space to understand its evolution under FiADD, which corroborates the advantage of employing FiADD for implicit hate speech detection.
翻译:尽管预训练大型语言模型(PLMs)在许多自然语言处理任务上取得了最先进的性能,但它们缺乏对隐式仇恨言论微妙表达的理解。这种细微且隐含的仇恨常被误分类为非仇恨。已有多种尝试通过增强外部上下文或采用基于距离的度量来强制标签分离,以提升对(隐式)仇恨内容的检测能力。我们结合了这两种方法,提出了FiADD,一种新颖的聚焦推断自适应密度判别框架。FiADD通过将隐式仇恨言论的表面形式更接近其隐含形式,同时增加各类标签之间的聚类间距离,来增强PLM的微调流程。我们在三个隐式仇恨数据集上测试了FiADD,观察到在二分类和三分类仇恨分类任务中取得了显著改进。我们进一步在另外三个任务——即检测讽刺、反讽和立场(其中表面形式和隐含形式存在差异)上实验了FiADD的泛化能力,并观察到类似的性能提升。我们分析了生成的潜在空间以理解其在FiADD下的演变过程,这证实了采用FiADD进行隐式仇恨言论检测的优势。