This paper describes the system of the LowResource Team for Task 2 of BLP-2023, which involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms. Our primary aim is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus, using various strategies including fine-tuning, dropping random tokens, and using several external datasets. Our final model is an ensemble of the three best BanglaBert variations. Our system has achieved overall 3rd in the Test Set among 30 participating teams with a score of 0.718. Additionally, we discuss the promising systems that didn't perform well namely task-adaptive pertaining and paraphrasing using BanglaT5. Training codes and external datasets which are used for our system are publicly available at https://github.com/Aunabil4602/bnlp-workshop-task2-2023
翻译:本文描述了 LowResource 团队在 BLP-2023 任务二中的系统,该任务涉及对来自多种社交媒体平台的公开帖子和评论构成的数据集进行情感分析。我们的主要目标是利用 BanglaBert(一个在大型孟加拉语语料库上预训练的 BERT 模型),并采用多种策略,包括微调、随机丢弃词元以及使用多个外部数据集。我们的最终模型是三个最佳 BanglaBert 变体的集成。在 30 个参赛团队中,我们的系统在测试集上总体排名第三,得分为 0.718。此外,我们讨论了未取得良好表现的有前景的系统,即使用 BanglaT5 进行任务自适应预训练和释义。我们系统使用的训练代码和外部数据集已在 https://github.com/Aunabil4602/bnlp-workshop-task2-2023 上公开。