Environmental, social, and governance (ESG) criteria are essential for evaluating corporate sustainability and ethical performance. However, professional ESG analysis is hindered by data fragmentation across unstructured sources, and existing large language models (LLMs) often struggle with the complex, multi-step workflows required for rigorous auditing. To address these limitations, we introduce ESGAgent, a hierarchical multi-agent system empowered by a specialized toolset, including retrieval augmentation, web search and domain-specific functions, to generate in-depth ESG analysis. Complementing this agentic system, we present a comprehensive three-level benchmark derived from 310 corporate sustainability reports, designed to evaluate capabilities ranging from atomic common-sense questions to the generation of integrated, in-depth analysis. Empirical evaluations demonstrate that ESGAgent outperforms state-of-the-art closed-source LLMs with an average accuracy of 84.15% on atomic question-answering tasks, and excels in professional report generation by integrating rich charts and verifiable references. These findings confirm the diagnostic value of our benchmark, establishing it as a vital testbed for assessing general and advanced agentic capabilities in high-stakes vertical domains.
翻译:环境、社会和治理(ESG)标准对于评估企业可持续性与伦理绩效至关重要。然而,专业的ESG分析受限于非结构化来源的数据碎片化问题,且现有的大语言模型(LLMs)往往难以应对严格审计所需的复杂多步骤工作流。为应对这些局限,我们引入了ESGAgent——一个由专用工具集赋能的分层多智能体系统,其工具包括检索增强、网络搜索及领域特定功能,以生成深入的ESG分析。作为该智能体系统的补充,我们提出了一个全面的三级基准,该基准源自310份企业可持续发展报告,旨在评估从原子级常识性问题到综合性深度分析生成的能力。实证评估表明,ESGAgent在原子级问答任务上的平均准确率达到84.15%,优于当前最先进的闭源LLMs,并通过整合丰富的图表与可验证的参考文献,在专业报告生成方面表现卓越。这些发现证实了我们基准的诊断价值,确立了其作为评估高风险垂直领域中通用及高级智能体能力的关键测试平台的地位。