In this paper, we introduce the Financial-STS task, a financial domain-specific NLP task designed to measure the nuanced semantic similarity between pairs of financial narratives. These narratives originate from the financial statements of the same company but correspond to different periods, such as year-over-year comparisons. Measuring the subtle semantic differences between these paired narratives enables market stakeholders to gauge changes over time in the company's financial and operational situations, which is critical for financial decision-making. We find that existing pretrained embedding models and LLM embeddings fall short in discerning these subtle financial narrative shifts. To address this gap, we propose an LLM-augmented pipeline specifically designed for the Financial-STS task. Evaluation on a human-annotated dataset demonstrates that our proposed method outperforms existing methods trained on classic STS tasks and generic LLM embeddings.
翻译:本文提出金融语义文本相似度(Financial-STS)任务,这是一项面向金融领域的自然语言处理任务,旨在衡量成对金融叙事文本间的细微语义相似性。这些文本源于同一公司不同时期(如同比分析)的财务报表。通过衡量配对文本间的语义细微差异,市场利益相关者可评估企业在不同时期财务与经营状况的变化,这对金融决策至关重要。研究发现,现有预训练嵌入模型及大语言模型嵌入在识别此类金融叙事细微变迁方面存在不足。为此,我们提出了一种专为Financial-STS任务设计的大语言模型增强型流水线。在人工标注数据集上的评估表明,该方法优于基于经典语义文本相似度任务训练的现有方法及通用大语言模型嵌入。