ESGReveal: An LLM-based approach for extracting structured data from ESG reports

ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports, catering to the critical need for reliable ESG information retrieval. This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques. The ESGReveal system includes an ESG metadata module for targeted queries, a preprocessing module for assembling databases, and an LLM agent for data extraction. Its efficacy was appraised using ESG reports from 166 companies across various sectors listed on the Hong Kong Stock Exchange in 2022, ensuring comprehensive industry and market capitalization representation. Utilizing ESGReveal unearthed significant insights into ESG reporting with GPT-4, demonstrating an accuracy of 76.9% in data extraction and 83.7% in disclosure analysis, which is an improvement over baseline models. This highlights the framework's capacity to refine ESG data analysis precision. Moreover, it revealed a demand for reinforced ESG disclosures, with environmental and social data disclosures standing at 69.5% and 57.2%, respectively, suggesting a pursuit for more corporate transparency. While current iterations of ESGReveal do not process pictorial information, a functionality intended for future enhancement, the study calls for continued research to further develop and compare the analytical capabilities of various LLMs. In summary, ESGReveal is a stride forward in ESG data processing, offering stakeholders a sophisticated tool to better evaluate and advance corporate sustainability efforts. Its evolution is promising in promoting transparency in corporate reporting and aligning with broader sustainable development aims.

翻译：ESGReveal是一种创新方法，旨在从企业报告中高效提取和分析环境、社会及治理（ESG）数据，以满足对可靠ESG信息检索的迫切需求。该方法利用结合检索增强生成（RAG）技术的大语言模型（LLM）实现。ESGReveal系统包含用于定向查询的ESG元数据模块、用于构建数据库的预处理模块以及用于数据提取的LLM智能体。其效能通过2022年香港证券交易所上市、涵盖各行业和市值的166家公司ESG报告进行了评估。利用ESGReveal结合GPT-4，揭示了ESG报告中的关键洞见，数据提取准确率达76.9%，披露分析准确率达83.7%，优于基线模型，凸显了该框架提升ESG数据分析精度的能力。此外，研究显示对强化ESG披露的需求，其中环境数据和社会数据披露率分别为69.5%和57.2%，表明企业透明度有待提升。尽管当前版ESGReveal无法处理图像信息（该功能拟于后续版本增强），但本研究呼吁持续探索，以进一步开发并比较不同LLM的分析能力。总之，ESGReveal是ESG数据处理领域的一大进步，为利益相关者提供了更优评估和推进企业可持续发展的精密工具，其演进有望提升企业报告透明度，并与更广泛的可持续发展目标相契合。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日