FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available (quite small size), and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that automates the collection and curation of real-time financial data from >34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes are available at https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP

翻译：大型语言模型（LLMs）在理解和生成类人文本方面展现出卓越的能力，这可能会彻底改变金融行业。然而，现有LLMs在金融领域往往表现不足，这主要归因于通用文本数据与金融文本数据之间的差异。不幸的是，目前可用的金融文本数据集数量有限（规模相当小），而首个金融大语言模型（FinLLM）BloombergGPT是闭源的（仅公开了训练日志）。鉴于此，我们旨在将互联网规模的金融数据民主化供LLMs使用，这因数据来源多样、信噪比低和时间有效性要求高而成为一项公开挑战。为应对这些挑战，我们引入了一个开源的且以数据为中心框架——金融生成式预训练Transformer（FinGPT），该框架可自动从互联网上34个以上不同来源采集和整理实时金融数据，为研究人员和从业者提供可访问且透明的资源来开发各自的FinLLMs。此外，我们提出了一种简单而有效的策略，利用来自市场的内在反馈来微调FinLLM，称为基于股票价格的强化学习（RLSP）。我们还采用了低秩适配（LoRA、QLoRA）方法，使用户能够以低成本从开源通用LLMs中定制自己的FinLLMs。最后，我们展示了多个FinGPT应用，包括智能投顾、用于算法交易的情感分析以及低代码开发。FinGPT旨在促进FinLLMs的民主化，激发创新，并开辟开放金融领域的新机遇。代码可在https://github.com/AI4Finance-Foundation/FinGPT 和 https://github.com/AI4Finance-Foundation/FinNLP 获取。