FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available, and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, Financial Generative Pre-trained Transformer (FinGPT), that automates the collection and curation of real-time financial data from 34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes have been open-sourced.

翻译：大语言模型（LLMs）在理解与生成类人文本方面展现出卓越能力，有望彻底变革金融行业。然而，现有LLMs在金融领域往往表现不足，这主要归因于通用文本数据与金融文本数据之间的差异。遗憾的是，目前可用的金融文本数据集数量有限，而首个金融大语言模型（FinLLM）BloombergGPT采用闭源模式（仅公开训练日志）。鉴于此，我们致力于实现LLMs对互联网规模金融数据的民主化——这一目标因数据源多样、信噪比低及时间有效性要求高等特点而充满挑战。为应对这些挑战，我们提出开源数据驱动框架FinGPT（Financial Generative Pre-trained Transformer），该框架可自动化从互联网34个不同数据源实时收集与整理金融数据，为研究人员和从业者提供可获取且透明的资源以开发其专属FinLLMs。此外，我们提出一种利用市场内在反馈微调FinLLM的简洁高效策略——基于股票价格的强化学习（RLSP）。同时采用低秩适配（LoRA/QLoRA）方法，使用户能够以低成本从通用LLMs定制专属FinLLMs。最后，我们展示FinGPT在智能投顾、算法交易情感分析及低代码开发等场景的应用。FinGPT旨在推动FinLLMs民主化，激发创新，为开放金融解锁新机遇。相关代码已开源。