Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

While large language models (LLMs) have shown strong general reasoning capabilities, their effectiveness in financial reasoning, which is crucial for real-world financial applications remains underexplored. In this study, we conduct a comprehensive evaluation of 24 state-of-the-art general and reasoning-focused LLMs across four complex financial reasoning tasks involving financial text, tabular data, and equations. We assess key capabilities such as numerical reasoning, tabular interpretation, financial terminology comprehension, long-context understanding, and equation-based problem solving. Our analysis reveals that while data quality and pretraining contribute to performance, general techniques like chain-of-thought (CoT) fine-tuning offer limited gains in financial tasks. To address this, we propose two domain-adapted models, Fino1-8B and Fino1-14B, trained with CoT fine-tuning and reinforcement learning using domain-specific reasoning paths. Our models are trained on a carefully curated dataset integrating high-quality examples from diverse sources, covering financial reports, tables, equations, and structured XBRL texts. Despite limited training data, they achieve an 7-9% performance improvement, outperforming several advanced LLMs, including GPT-o1, GPT-o3-mini, GPT-4.5, and comparable with DeepSeek models (V3 and R1), demonstrating strong practical value in resource, constrained scenarios. Our findings highlight the need for domain-specific adaptations in financial reasoning, and we release all datasets, models, and code for future research.

翻译：尽管大语言模型（LLM）已展现出强大的通用推理能力，但其在金融推理任务中的有效性——这对现实世界金融应用至关重要——仍未得到充分探索。本研究对24个最先进的通用及推理专用LLM在四项涉及金融文本、表格数据和公式的复杂金融推理任务上进行了全面评估。我们评估了数值推理、表格解读、金融术语理解、长上下文理解及基于公式的问题求解等关键能力。分析表明，虽然数据质量和预训练对性能有贡献，但思维链（CoT）微调等通用技术在金融任务中带来的提升有限。为此，我们提出了两个领域自适应模型Fino1-8B和Fino1-14B，它们通过使用领域特定推理路径进行CoT微调和强化学习训练而成。我们的模型基于精心构建的数据集进行训练，该数据集整合了来自金融报告、表格、公式及结构化XBRL文本等多种来源的高质量示例。尽管训练数据有限，这些模型实现了7-9%的性能提升，超越了包括GPT-o1、GPT-o3-mini、GPT-4.5在内的多个先进LLM，并与DeepSeek模型（V3和R1）表现相当，在资源受限场景中展现出强大的实用价值。我们的研究结果凸显了金融推理领域进行领域特定适配的必要性，并公开了全部数据集、模型及代码以供未来研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/