Think2SQL：增强大型语言模型在Text2SQL任务中的推理能力 (Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL)

Large Language Models (LLMs) have shown impressive capabilities in transforming natural language questions about relational databases into SQL queries. Despite recent improvements, small LLMs struggle to handle questions involving multiple tables and complex SQL patterns under a Zero-Shot Learning (ZSL) setting. Supervised Fine-Tuning (SFT) partially compensate the knowledge deficits in pretrained models but falls short while dealing with queries involving multi-hop reasoning. To bridge this gap, different LLM training strategies to reinforce reasoning capabilities have been proposed, ranging from leveraging a thinking process within ZSL, including reasoning traces in SFT, or adopt Reinforcement Learning (RL) strategies. However, the influence of reasoning on Text2SQL performance is still largely unexplored. This paper investigates to what extent LLM reasoning capabilities influence their Text2SQL performance on four benchmark datasets. To this end, it considers the following LLM settings: (1) ZSL, including general-purpose reasoning or not; (2) SFT, with and without task-specific reasoning traces; (3) RL, leveraging execution accuracy as primary reward function; (4) SFT+RL, i.e, a two-stage approach that combines SFT and RL. The results show that general-purpose reasoning under ZSL proves to be ineffective in tackling complex Text2SQL cases. Small LLMs benefit from SFT with reasoning much more than larger ones, bridging the gap of their (weaker) model pretraining. RL is generally beneficial across all tested models and datasets, particularly when SQL queries involve multi-hop reasoning and multiple tables. Small LLMs with SFT+RL excel on most complex datasets thanks to a strategic balance between generality of the reasoning process and optimization of the execution accuracy. Thanks to RL, the7B Qwen-Coder-2.5 model performs on par with 100+ Billion ones on the Bird dataset.

翻译：大型语言模型（LLMs）在将针对关系型数据库的自然语言问题转换为SQL查询方面已展现出令人印象深刻的能力。尽管近期有所改进，但小型LLMs在零样本学习（ZSL）设置下，仍难以处理涉及多表和复杂SQL模式的问题。监督微调（SFT）部分弥补了预训练模型的知识缺陷，但在处理涉及多跳推理的查询时仍显不足。为弥合这一差距，研究者们提出了多种增强推理能力的LLM训练策略，包括在ZSL中利用思维过程、在SFT中加入推理轨迹，或采用强化学习（RL）策略。然而，推理能力对Text2SQL性能的影响在很大程度上仍未得到充分探索。本文研究了LLM的推理能力在多大程度上影响其在四个基准数据集上的Text2SQL性能。为此，本文考虑了以下LLM设置：（1）ZSL，包括或不包括通用推理；（2）SFT，包含或不包含任务特定的推理轨迹；（3）RL，利用执行准确率作为主要奖励函数；（4）SFT+RL，即结合SFT和RL的两阶段方法。结果表明，ZSL下的通用推理被证明在处理复杂的Text2SQL案例时效果不佳。小型LLMs从带有推理的SFT中获益远大于大型模型，从而弥合了其（较弱的）模型预训练差距。RL在所有测试模型和数据集上普遍有益，特别是当SQL查询涉及多跳推理和多个表时。得益于推理过程的通用性与执行准确率优化之间的策略平衡，采用SFT+RL的小型LLMs在大多数复杂数据集上表现出色。借助RL，7B参数的Qwen-Coder-2.5模型在Bird数据集上的表现与超过1000亿参数的模型相当。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日