Climate change poses critical challenges globally, disproportionately affecting low-income countries that often lack resources and linguistic representation on the international stage. Despite Bangladesh's status as one of the most vulnerable nations to climate impacts, research gaps persist in Bengali-language studies related to climate change and NLP. To address this disparity, we introduce Dhoroni, a novel Bengali (Bangla) climate change and environmental news dataset, comprising a 2300 annotated Bangla news articles, offering multiple perspectives such as political influence, scientific/statistical data, authenticity, stance detection, and stakeholder involvement. Furthermore, we present an in-depth exploratory analysis of Dhoroni and introduce BanglaBERT-Dhoroni family, a novel baseline model family for climate and environmental opinion detection in Bangla, fine-tuned on our dataset. This research contributes significantly to enhancing accessibility and analysis of climate discourse in Bengali (Bangla), addressing crucial communication and research gaps in climate-impacted regions like Bangladesh with 180 million people.
翻译:气候变化在全球范围内带来严峻挑战,对低收入国家的影响尤为严重,这些国家往往缺乏资源且在国际舞台上缺乏语言代表性。尽管孟加拉国是受气候影响最脆弱的地区之一,但针对气候变化与自然语言处理的孟加拉语研究仍存在空白。为弥补这一差距,我们提出了Dhoroni——一个新颖的孟加拉语气候变化与环境新闻数据集,包含2300篇带标注的孟加拉语新闻文章,提供政治影响、科学/统计数据、真实性、立场检测及利益相关方参与等多重视角。此外,我们对Dhoroni进行了深入的探索性分析,并推出了BanglaBERT-Dhoroni系列模型,这是一个基于我们数据集微调的、用于孟加拉语气候与环境观点检测的新型基线模型家族。本研究显著提升了孟加拉语气候论述的可及性与分析能力,为孟加拉国等拥有1.8亿人口的气候影响地区解决了关键的传播与研究缺口。