The conversion of natural language queries into SQL queries, known as Text-to-SQL, is a critical yet challenging task. This paper introduces EPI-SQL, a novel methodological framework leveraging Large Language Models (LLMs) to enhance the performance of Text-to-SQL tasks. EPI-SQL operates through a four-step process. Initially, the method involves gathering instances from the Spider dataset on which LLMs are prone to failure. These instances are then utilized to generate general error-prevention instructions (EPIs). Subsequently, LLMs craft contextualized EPIs tailored to the specific context of the current task. Finally, these context-specific EPIs are incorporated into the prompt used for SQL generation. EPI-SQL is distinguished in that it provides task-specific guidance, enabling the model to circumvent potential errors for the task at hand. Notably, the methodology rivals the performance of advanced few-shot methods despite being a zero-shot approach. An empirical assessment using the Spider benchmark reveals that EPI-SQL achieves an execution accuracy of 85.1\%, underscoring its effectiveness in generating accurate SQL queries through LLMs. The findings indicate a promising direction for future research, i.e. enhancing instructions with task-specific and contextualized rules, for boosting LLMs' performance in NLP tasks.
翻译:将自然语言查询转换为SQL查询(即Text-to-SQL)是一项关键但具有挑战性的任务。本文提出EPI-SQL,一种利用大型语言模型(LLMs)提升Text-to-SQL任务性能的新型方法论框架。EPI-SQL通过四步流程运行:首先,从Spider数据集中收集LLMs易出错的实例;随后,利用这些实例生成通用的防错指令(EPIs);接着,LLMs根据当前任务的具体情境生成情境化EPIs;最后,将这些情境化EPIs纳入用于SQL生成的提示中。EPI-SQL的独特之处在于提供任务特异性指导,使模型能够规避当前任务的潜在错误。值得注意的是,该方法虽为零样本方法,但其性能可与先进的少样本方法相媲美。基于Spider基准的实证评估显示,EPI-SQL的执行准确率达到85.1%,充分证明了其通过LLMs生成准确SQL查询的有效性。研究结果为未来研究方向提供了启示,即通过任务特异性和情境化规则增强指令,以提升LLMs在自然语言处理任务中的性能。