Corporate Greenhouse Gas (GHG) emission targets are important metrics in sustainable investing [12, 16]. To provide a comprehensive view of company emission objectives, we propose an approach to source these metrics from company public disclosures. Without automation, curating these metrics manually is a labor-intensive process that requires combing through lengthy corporate sustainability disclosures that often do not follow a standard format. Furthermore, the resulting dataset needs to be validated thoroughly by Subject Matter Experts (SMEs), further lengthening the time-to-market. We introduce the Climate Artificial Intelligence for Corporate Decarbonization Metrics Extraction (CAI) model and pipeline, a novel approach utilizing Large Language Models (LLMs) to extract and validate linked metrics from corporate disclosures. We demonstrate that the process improves data collection efficiency and accuracy by automating data curation, validation, and metric scoring from public corporate disclosures. We further show that our results are agnostic to the choice of LLMs. This framework can be applied broadly to information extraction from textual data.
翻译:企业温室气体排放目标是可持续投资中的重要指标[12, 16]。为全面呈现企业排放目标,我们提出一种从企业公开披露文件中获取这些指标的方法。若无自动化手段,人工整理这些指标是劳动密集型过程,需要梳理冗长的企业可持续发展报告,且此类报告通常缺乏标准格式。此外,所得数据集需经领域专家全面验证,进一步延长了上市周期。我们提出气候人工智能企业脱碳指标提取模型与流程,这是一种利用大语言模型从企业披露文件中提取并验证关联指标的新方法。我们证明该流程通过自动化处理企业公开披露文件的数据整理、验证与指标评分,显著提升了数据收集效率与准确性。我们进一步表明研究结果对大语言模型的选择具有普适性。该框架可广泛应用于文本数据的信息提取任务。