Text-to-SQL parsing involves the translation of natural language queries (NLQs) into their corresponding SQL commands. A principal challenge within this domain is the formulation of SQL queries that are not only syntactically correct but also semantically aligned with the natural language input. However, the intrinsic disparity between the NLQ and the SQL poses a significant challenge. In this research, we introduce Keyword Instruction (KeyInst), a novel method designed to enhance SQL formulation by Large Language Models (LLMs). KeyInst essentially provides guidance on pivotal SQL keywords likely to be part of the final query, thus facilitates a smoother SQL query formulation process. We explore two strategies for integrating KeyInst into Text-to-SQL parsing: a pipeline strategy and a single-pass strategy. The former first generates KeyInst for question, which are then used to prompt LLMs. The latter employs a fine-tuned model to concurrently generate KeyInst and SQL in one step. We developed StrucQL, a benchmark specifically designed for the evaluation of SQL formulation. Extensive experiments on StrucQL and other benchmarks demonstrate that KeyInst significantly improves upon the existing Text-to-SQL prompting techniques.
翻译:文本到SQL解析涉及将自然语言查询(NLQ)翻译为对应的SQL命令。该领域的一个主要挑战在于生成的SQL查询不仅要语法正确,还需在语义上与自然语言输入保持一致。然而,自然语言查询与SQL之间的固有差异构成了显著障碍。本研究提出关键词指令(KeyInst)这一新方法,旨在通过大语言模型(LLM)增强SQL生成能力。KeyInst的核心在于为可能出现在最终查询中的关键SQL关键词提供引导,从而促进更流畅的SQL查询生成过程。我们探讨了将KeyInst集成到文本到SQL解析的两种策略:流水线策略与单步策略。前者首先生成针对问题的KeyInst,随后将其用于提示LLM;后者采用微调模型同步生成KeyInst与SQL。我们构建了专门用于评估SQL生成能力的基准数据集StrucQL。在StrucQL及其他基准上的大量实验表明,KeyInst相较于现有文本到SQL提示技术取得了显著提升。