FLAG: Financial Long Document Classification via AMR-based GNN

from arxiv, 8 pages, 3 figures, to be published in CIFEr Conference 2024 as "Semantic Graph Learning for Trend Prediction from Long Financial Documents"

The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attention operation is employed. In recent years, progress has been made in Abstract Meaning Representation (AMR), which is a graph-based representation of text to preserve its semantic relations. Since AMR can represent semantic relationships at a deeper level, it can be beneficially utilized by graph neural networks (GNNs) for constructing effective document-level graph representations built upon LLM embeddings to predict target metrics in the financial domain. We propose FLAG: Financial Long document classification via AMR-based GNN, an AMR graph based framework to generate document-level embeddings for long financial document classification. We construct document-level graphs from sentence-level AMR graphs, endow them with specialized LLM word embeddings in the financial domain, apply a deep learning mechanism that utilizes a GNN, and examine the efficacy of our AMR-based approach in predicting labeled target data from long financial documents. Extensive experiments are conducted on a dataset of quarterly earnings calls transcripts of companies in various sectors of the economy, as well as on a corpus of more recent earnings calls of companies in the S&P 1500 Composite Index. We find that our AMR-based approach outperforms fine-tuning LLMs directly on text in predicting stock price movement trends at different time horizons in both datasets. Our work also outperforms previous work utilizing document graphs and GNNs for text classification.

翻译：大型语言模型（LLM）的出现推动了其在金融领域多种应用的研究。然而，将LLM应用于长文档时，语义关系未被显式纳入，且通常采用完全或任意稀疏的注意力机制。近年来，抽象语义表示（AMR）取得进展，这是一种基于图的文本表示方法，能够保留文本的语义关系。由于AMR能在更深层次表征语义关联，可被图神经网络（GNN）有效利用，从而基于LLM嵌入构建文档级图表示，以预测金融领域的目标指标。本文提出FLAG：基于AMR图神经网络的金融长文档分类框架，该框架利用AMR图生成文档级嵌入以进行金融长文档分类。我们从句级AMR图构建文档级图，赋予其金融领域专用的LLM词嵌入，应用基于GNN的深度学习机制，并验证所提出的基于AMR的方法在预测金融长文档标注目标数据方面的有效性。我们在涵盖多个经济部门的公司季度财报电话会议记录数据集，以及标普1500综合指数成分股公司近期财报电话会议语料库上进行了大量实验。实验表明，在两个数据集中预测不同时间跨度的股价变动趋势时，我们基于AMR的方法均优于直接对文本进行LLM微调的方法。本工作也超越了先前利用文档图和GNN进行文本分类的研究成果。