Large Language Models (LLMs) demonstrate exceptional performance in textual understanding and tabular reasoning tasks. However, their ability to comprehend and analyze hybrid text, containing textual and tabular data, remains underexplored. In this research, we specialize in harnessing the potential of LLMs to comprehend critical information from financial reports, which are hybrid long-documents. We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports. To evaluate AFIE, we develop a Financial Reports Numerical Extraction (FINE) dataset and conduct an extensive experimental analysis. Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94% and 33.77%, respectively, compared to a naive method. These results suggest that the AFIE framework offers accuracy for automated numerical extraction from complex, hybrid documents.
翻译:大语言模型在文本理解与表格推理任务中展现出卓越性能。然而,其在理解与分析包含文本及表格数据的混合文本方面的能力尚未得到充分探索。本研究聚焦于挖掘大语言模型理解财务报告(一种混合长文档)关键信息的潜力,提出了一种自动化金融信息提取框架,该框架能增强大语言模型理解并提取财务报告中信息的能力。为评估该框架,我们构建了财务报告数值提取数据集并开展广泛实验分析。该框架在GPT-3.5与GPT-4上得到有效验证,相比朴素方法,平均准确率分别提升了53.94%与33.77%。结果表明,自动化金融信息提取框架为从复杂混合文档中自动提取数值信息提供了准确性的保障。