This paper describes our participation in SemEval-2023 Task 10, whose goal is the detection of sexism in social media. We explore some of the most popular transformer models such as BERT, DistilBERT, RoBERTa, and XLNet. We also study different data augmentation techniques to increase the training dataset. During the development phase, our best results were obtained by using RoBERTa and data augmentation for tasks B and C. However, the use of synthetic data does not improve the results for task C. We participated in the three subtasks. Our approach still has much room for improvement, especially in the two fine-grained classifications. All our code is available in the repository https://github.com/isegura/hulat_edos.
翻译:本文介绍了我们参与SemEval-2023任务10的情况,该任务旨在检测社交媒体中的性别歧视内容。我们探索了目前最流行的Transformer模型,如BERT、DistilBERT、RoBERTa和XLNet,同时研究了不同数据增强技术以扩充训练数据集。在开发阶段,我们在任务B和任务C中通过结合RoBERTa与数据增强获得了最佳结果,但在任务C中合成数据的使用并未改善性能。我们参与了全部三个子任务的研究,当前方法在细粒度分类方面仍有较大改进空间。所有代码已开源至仓库 https://github.com/isegura/hulat_edos。