Identifying the targets of hate speech is a crucial step in grasping the nature of such speech and, ultimately, in improving the detection of offensive posts on online forums. Much harmful content on online platforms uses implicit language especially when targeting vulnerable and protected groups such as using stereotypical characteristics instead of explicit target names, making it harder to detect and mitigate the language. In this study, we focus on identifying implied targets of hate speech, essential for recognizing subtler hate speech and enhancing the detection of harmful content on digital platforms. We define a new task aimed at identifying the targets even when they are not explicitly stated. To address that task, we collect and annotate target spans in three prominent implicit hate speech datasets: SBIC, DynaHate, and IHC. We call the resulting merged collection Implicit-Target-Span. The collection is achieved using an innovative pooling method with matching scores based on human annotations and Large Language Models (LLMs). Our experiments indicate that Implicit-Target-Span provides a challenging test bed for target span detection methods.
翻译:识别仇恨言论的目标是理解此类言论性质的关键步骤,并最终有助于改进在线论坛中攻击性帖子的检测。在线平台上的许多有害内容使用隐式语言,尤其是在针对弱势和受保护群体时,例如使用刻板特征代替明确的目标名称,这使得检测和缓解这些语言变得更加困难。在本研究中,我们专注于识别仇恨言论的隐含目标,这对于识别更隐蔽的仇恨言论并增强数字平台上有害内容的检测至关重要。我们定义了一个新任务,旨在即使目标未明确表述时也能进行识别。为解决该任务,我们在三个著名的隐式仇恨言论数据集(SBIC、DynaHate 和 IHC)中收集并标注了目标跨度。我们将生成的合并数据集称为 Implicit-Target-Span。该数据集通过一种创新的池化方法实现,该方法基于人工标注和大语言模型(LLM)的匹配分数。我们的实验表明,Implicit-Target-Span 为目标跨度检测方法提供了一个具有挑战性的测试平台。