Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs. SQL for No-Code Access to Relational Databases

Large Language Models (LLMs) have spurred progress in text-to-SQL, the task of generating SQL queries from natural language questions based on a given database schema. Despite the declarative nature of SQL, it continues to be a complex programming language. In this paper, we investigate the potential of an alternative query language with simpler syntax and modular specification of complex queries. The purpose is to create a query language that can be learned more easily by modern neural semantic parsing architectures while also enabling non-programmers to better assess the validity of the query plans produced by an interactive query plan assistant. The proposed alternative query language is called Query Plan Language (QPL). It is designed to be modular and can be translated into a restricted form of SQL Common Table Expressions (CTEs). The aim of QPL is to make complex data retrieval accessible to non-programmers by allowing users to express their questions in natural language while also providing an easier-to-verify target language. The paper demonstrates how neural LLMs can benefit from QPL's modularity to generate complex query plans in a compositional manner. This involves a question decomposition strategy and a planning stage. We conduct experiments on a version of the Spider text-to-SQL dataset that has been converted to QPL. The hierarchical structure of QPL programs enables us to measure query complexity naturally. Based on this assessment, we identify the low accuracy of existing text-to-SQL systems on complex compositional queries. We present ways to address the challenge of complex queries in an iterative, user-controlled manner, using fine-tuned LLMs and a variety of prompting strategies in a compositional manner.

翻译：大型语言模型（LLMs）推动了文本到SQL（text-to-SQL）任务的发展，该任务旨在基于给定的数据库模式，从自然语言问题生成SQL查询。尽管SQL具有声明式特性，但它仍是一种复杂的编程语言。本文探索了一种替代性查询语言的潜力，该语言采用更简洁的语法和模块化方式来描述复杂查询。其目标是创建一种能被现代神经语义解析架构更易学习的查询语言，同时使非程序员能够更好地评估交互式查询计划助手生成的查询计划的有效性。该替代查询语言被命名为查询计划语言（Query Plan Language, QPL）。QPL具有模块化设计，并可转换为受限形式的SQL公共表表达式（CTEs）。其目标是通过允许用户用自然语言表达问题，同时提供更易验证的目标语言，使非程序员也能访问复杂数据检索。本文展示了神经LLMs如何利用QPL的模块化特性，以组合方式生成复杂查询计划，这涉及问题分解策略与规划阶段。我们在转换为QPL版本的Spider文本到SQL数据集上进行了实验。QPL程序的层次化结构使我们能够自然测量查询复杂性。基于此评估，我们发现了现有文本到SQL系统在复杂组合查询上的低准确率问题。我们提出通过迭代式、用户可控的方式应对复杂查询挑战，具体采用微调LLMs及多种提示策略的组合方法。