The rapid growth of the financial sector and the rising focus on Environmental, Social, and Governance (ESG) considerations highlight the need for advanced NLP tools. However, open-source LLMs proficient in both finance and ESG domains remain scarce. To address this gap, we introduce SusGen-30K, a category-balanced dataset comprising seven financial NLP tasks and ESG report generation, and propose TCFD-Bench, a benchmark for evaluating sustainability report generation. Leveraging this dataset, we developed SusGen-GPT, a suite of models achieving state-of-the-art performance across six adapted and two off-the-shelf tasks, trailing GPT-4 by only 2% despite using 7-8B parameters compared to GPT-4's 1,700B. Based on this, we propose the SusGen system, integrated with Retrieval-Augmented Generation (RAG), to assist in sustainability report generation. This work demonstrates the efficiency of our approach, advancing research in finance and ESG.
翻译:金融领域的快速增长以及环境、社会和治理(ESG)因素日益受到关注,突显了对先进自然语言处理工具的需求。然而,同时精通金融与ESG领域的开源大语言模型仍然稀缺。为填补这一空白,我们推出了SusGen-30K——一个包含七项金融自然语言处理任务与ESG报告生成的类别平衡数据集,并提出了TCFD-Bench,一个用于评估可持续发展报告生成的基准。利用该数据集,我们开发了SusGen-GPT模型系列,该系列模型在六项适配任务与两项现成任务上均取得了最先进的性能;尽管仅使用70-80亿参数(相较于GPT-4的1.7万亿参数),其表现仅落后GPT-4约2%。基于此,我们进一步提出了集成检索增强生成技术的SusGen系统,以辅助可持续发展报告生成。本项工作验证了我们方法的有效性,推动了金融与ESG领域的相关研究。