AIWizards at MULTIPRIDE: A Hierarchical Approach to Slur Reclamation Detection

Detecting reclaimed slurs represents a fundamental challenge for hate speech detection systems, as the same lexcal items can function either as abusive expressions or as in-group affirmations depending on social identity and context. In this work, we address Subtask B of the MultiPRIDE shared task at EVALITA 2026 by proposing a hierarchical approach to modeling the slur reclamation process. Our core assumption is that members of the LGBTQ+ community are more likely, on average, to employ certain slurs in a eclamatory manner. Based on this hypothesis, we decompose the task into two stages. First, using a weakly supervised LLM-based annotation, we assign fuzzy labels to users indicating the likelihood of belonging to the LGBTQ+ community, inferred from the tweet and the user bio. These soft labels are then used to train a BERT-like model to predict community membership, encouraging the model to learn latent representations associated with LGBTQ+ identity. In the second stage, we integrate this latent space with a newly initialized model for the downstream slur reclamation detection task. The intuition is that the first model encodes user-oriented sociolinguistic signals, which are then fused with representations learned by a model pretrained for hate speech detection. Experimental results on Italian and Spanish show that our approach achieves performance statistically comparable to a strong BERT-based baseline, while providing a modular and extensible framework for incorporating sociolinguistic context into hate speech modeling. We argue that more fine-grained hierarchical modeling of user identity and discourse context may further improve the detection of reclaimed language. We release our code at https://github.com/LucaTedeschini/multipride.

翻译：污名语回收检测是仇恨言论检测系统面临的根本性挑战，因为相同的词汇项可能根据社会身份和语境发挥侮辱性表达或群体内肯定的不同功能。本研究通过提出一种建模污名语回收过程的层次化方法，针对EVALITA 2026多语言仇恨言论检测共享任务中的子任务B展开研究。我们的核心假设是：LGBTQ+群体成员平均更倾向于以回收肯定的方式使用特定污名语。基于此假设，我们将任务分解为两个阶段：首先，通过基于大语言模型的弱监督标注方法，根据推文内容和用户简介推断用户属于LGBTQ+群体的可能性，并为其分配模糊标签。这些软标签随后用于训练类BERT模型来预测群体归属，促使模型学习与LGBTQ+身份相关的潜在表征。第二阶段，我们将此潜在空间与新初始化的下游污名语回收检测模型进行整合。其原理在于：第一阶段模型编码了面向用户的社会语言学信号，这些信号将与经过仇恨言论检测预训练的模型学习到的表征进行融合。在意大利语和西班牙语数据集上的实验结果表明，我们的方法取得了与强BERT基线模型统计相当的性能，同时为将社会语言学语境融入仇恨言论建模提供了模块化可扩展的框架。我们认为，对用户身份和话语语境进行更细粒度的层次化建模可能进一步提升回收语言的检测效果。代码已发布于https://github.com/LucaTedeschini/multipride。