We focus on Text-to-SQL semantic parsing from the perspective of retrieval-augmented generation. Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose $\text{ASTReS}$ that dynamically retrieves input database information and uses abstract syntax trees to select few-shot examples for in-context learning. Furthermore, we investigate the extent to which an in-parallel semantic parser can be leveraged for generating approximated versions of the expected SQL queries, to support our retrieval. We take this approach to the extreme--we adapt a model consisting of less than $500$M parameters, to act as an extremely efficient approximator, enhancing it with the ability to process schemata in a parallelised manner. We apply $\text{ASTReS}$ to monolingual and cross-lingual benchmarks for semantic parsing, showing improvements over state-of-the-art baselines. Comprehensive experiments highlight the contribution of modules involved in this retrieval-augmented generation setting, revealing interesting directions for future work.
翻译:本文从检索增强生成的角度研究文本到SQL语义解析任务。针对商业数据库模式规模庞大及商业智能解决方案可部署性等挑战,我们提出$\text{ASTReS}$方法,该方法动态检索输入数据库信息,并利用抽象语法树为上下文学习选择少样本示例。此外,我们探究了如何利用并行语义解析器生成预期SQL查询的近似版本以支持检索过程。我们将此方法推向极致——通过适配参数量不足$5$亿的模型作为高效近似器,并增强其并行处理模式的能力。我们将$\text{ASTReS}$应用于单语与跨语言语义解析基准测试,结果显示其性能优于当前最先进的基线模型。综合实验揭示了检索增强生成框架中各模块的贡献,为未来研究指明了有潜力的方向。