Text-to-SQL, which provides zero-code interface for operating relational databases, has gained much attention in financial analysis; because, financial professionals may not well-skilled in SQL programming. However, until now, there is no practical Text-to-SQL benchmark dataset for financial analysis, and existing Text-to-SQL methods have not considered the unique characteristics of databases in financial applications, such as commonly existing wide tables. To address these issues, we collect a practical Text-to-SQL benchmark dataset and propose a model-agnostic Large Language Model (LLMs)-based Text-to-SQL framework for financial analysis. The benchmark dataset, BULL, is collected from the practical financial analysis business of Hundsun Technologies Inc., including databases for fund, stock, and macro economy. Besides, the proposed LLMs-based Text-to-SQL framework, FinSQL, provides a systematic treatment for financial Text-to-SQL from the perspectives of prompt construction, parameter-efficient fine-tuning and output calibration. Extensive experimental results on BULL demonstrate that FinSQL achieves the state-of-the-art Text-to-SQL performance at a small cost; furthermore, FinSQL can bring up to 36.64% performance improvement in scenarios requiring few-shot cross-database model transfer.
翻译:摘要:文本转SQL技术通过为零代码操作关系数据库提供接口,在金融分析领域受到广泛关注,因为金融专业人士可能不擅长SQL编程。然而,目前尚缺乏面向金融分析的实用文本转SQL基准数据集,且现有文本转SQL方法未考虑金融应用中数据库的独特特性,例如普遍存在的宽表。为解决这些问题,我们收集了一个实用的文本转SQL基准数据集,并提出了一种用于金融分析的模型无关型大语言模型文本转SQL框架。该基准数据集BULL源自恒生电子股份有限公司的实际金融分析业务,涵盖基金、股票和宏观经济数据库。此外,所提出的基于大语言模型的文本转SQL框架FinSQL,从提示构建、参数高效微调和输出校准的角度,为金融文本转SQL提供了系统性处理方案。在BULL上的大量实验结果表明,FinSQL以较低的成本实现了最先进的文本转SQL性能;此外,在需要少样本跨数据库模型迁移的场景中,FinSQL可带来高达36.64%的性能提升。