Generating accurate SQL for user queries (text-to-SQL) is a long-standing problem since the generation of the SQL requires comprehending the query and database and retrieving the accurate data from the database accordingly. Existing models rely on the comprehensive ability of Large Language Models (LLMs) to generate the SQL according to the database schema. However, there is some necessary knowledge that is not explicitly included in the database schema or has been learned by LLMs. Thus, the generated SQL of the knowledge-insufficient queries may be inaccurate, which negatively impacts the robustness of the text-to-SQL models. To deal with this situation, we propose the Knowledge-to-SQL framework, which employs tailored Data Expert LLM (DELLM) to provide helpful knowledge for all types of text-to-SQL models. Specifically, we provide the detailed design of DELLM, in terms of table reading, and the basic fine-tuning process. We further provide a Preference Learning via Database Feedback (PLDBF) training strategy to guide the DELLM to generate more helpful knowledge for LLMs. Extensive experiments verify DELLM can enhance the state-of-the-art LLMs on text-to-SQL tasks. The model structure and the parameter weight of DELLM are released for further research.
翻译:为用户查询生成准确SQL(文本到SQL)是一个长期存在的难题,因为SQL的生成需要理解查询和数据库,并据此从数据库中检索准确数据。现有模型依赖大型语言模型(LLM)的综合能力,根据数据库模式生成SQL。然而,数据库模式中并未显式包含某些必要知识,或LLM尚未掌握这些知识。因此,知识不足的查询生成的SQL可能不准确,从而对文本到SQL模型的鲁棒性产生负面影响。针对这一情况,我们提出了知识到SQL框架,该框架采用定制化的数据专家大语言模型(DELLM),为所有类型的文本到SQL模型提供有用的知识。具体而言,我们详细设计了DELLM的表读取方法及基础微调流程。此外,我们还提出了一种基于数据库反馈的偏好学习(PLDBF)训练策略,以引导DELLM为LLM生成更有帮助的知识。大量实验证明,DELLM能够增强当前最先进的LLM在文本到SQL任务上的表现。我们已公开DELLM的模型结构和参数权重,供进一步研究使用。