The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community.
翻译:数字世界的快速扩展使情感分析成为营销、政治、客户服务和医疗等不同领域的关键工具。尽管针对广泛使用语言的情感分析取得了显著进展,但孟加拉语等低资源语言因资源限制仍 largely 处于研究不足状态。此外,大型语言模型(LLMs)在各种应用中展现出的空前性能,凸显了在低资源语言环境中评估其表现的必要性。本研究呈现了一个包含33,606条孟加拉语新闻推文与Facebook评论的大规模手工标注数据集。我们同时探索了多种语言模型(包括Flan-T5、GPT-4和Bloomz)的零样本与少样本上下文学习,并与微调模型进行了对比分析。研究结果表明,即使在零样本和少样本场景下,基于单语Transformer的模型始终优于其他模型。为促进持续探索,我们计划公开该数据集及研究工具,供广大研究社群使用。