AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime

With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the learning with disagreements regime. This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm by training directly from the data with disagreements, without using any aggregated label. Yet, performances considering both soft and hard evaluations are reported. The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish. In particular, our system is articulated in three different pipelines. The ensemble approach outperformed the individual large language models obtaining the best performances both adopting a soft and a hard label evaluation. This work describes the participation in all the three EXIST tasks, considering a soft evaluation, it obtained fourth place in Task 2 at EXIST and first place in Task 3, with the highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of our approaches is publicly available at https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement.

翻译：随着社交媒体平台影响力的日益增强，开发能够自动检测性别歧视及其他不尊重、仇恨行为的系统，对于营造更具包容性与尊重性的网络环境至关重要。然而，考虑到不同仇恨类别及作者意图，尤其是在分歧学习机制下，这些任务面临显著挑战。本文描述了AI-UPV团队在CLEF 2023的EXIST（社交媒体中性别歧视识别）评测任务中的参与情况。所提出的方法旨在通过直接从存在分歧的数据中训练（而不使用任何聚合标签），解决分歧学习范式下的性别歧视识别与特征描述任务，并同时报告了软评估和硬评估下的性能表现。该系统采用大型语言模型（即mBERT和XLM-RoBERTa）及集成策略，对英语和西班牙语中的性别歧视进行识别与分类。具体而言，系统由三条不同流水线构成。集成方法优于单个大型语言模型，在软标签和硬标签评估中均取得最佳性能。本文描述了我们在全部三项EXIST任务中的参与情况：在软评估下，系统在EXIST任务二中排名第四，在任务三中排名第一，最优ICM-Soft值为-2.32，归一化ICM-Soft值为0.79。本方法的源代码已在https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement 公开提供。