This paper describes our participation in SemEval-2023 Task 10, whose goal is the detection of sexism in social media. We explore some of the most popular transformer models such as BERT, DistilBERT, RoBERTa, and XLNet. We also study different data augmentation techniques to increase the training dataset. During the development phase, our best results were obtained by using RoBERTa and data augmentation for tasks B and C. However, the use of synthetic data does not improve the results for task C. We participated in the three subtasks. Our approach still has much room for improvement, especially in the two fine-grained classifications. All our code is available in the repository https://github.com/isegura/hulat_edos.
翻译:本文描述了我们在SemEval-2023任务10中的参与情况,该任务旨在检测社交媒体中的性别歧视现象。我们探索了BERT、DistilBERT、RoBERTa和XLNet等最流行的Transformer模型,并研究了不同数据增强技术以扩充训练数据集。在开发阶段,我们在任务B和任务C中通过使用RoBERTa与数据增强取得了最佳结果。然而,合成数据的引入并未提升任务C的效果。我们参与了全部三个子任务。我们的方法在两个细粒度分类任务中仍存在较大的改进空间。所有代码已开源至仓库https://github.com/isegura/hulat_edos。