EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

翻译：神经文本到SQL模型能够根据数据库模式将自然语言问题（NLQ）转换为SQL查询，已取得显著性能。然而，数据库模式为满足新需求而频繁演化，这种模式演化常导致基于静态模式训练的模型性能下降。现有工作要么主要关注对NLQ、数据库和SQL之间某些语法或语义映射的简单复述，要么缺乏全面可控的方法来研究模式演化下的模型鲁棒性问题，这在面对现实中日益复杂丰富的数据库模式变化时显得不足，尤其是在大语言模型（LLM）时代。为应对模式演化带来的挑战，我们提出了EvoSchema——一个旨在评估和增强文本到SQL系统在真实世界模式变化下鲁棒性的综合基准。EvoSchema引入了一种新颖的模式演化分类法，涵盖列级和表级修改的十种扰动类型，系统模拟了数据库模式的动态特性。通过EvoSchema，我们对不同开源和闭源LLM进行了深入评估，结果表明表级扰动对模型性能的影响显著大于列级变化。此外，EvoSchema为开发更具韧性的文本到SQL系统（包括模型训练和数据库设计两方面）提供了启示。在EvoSchema多样化模式设计上训练的模型，能够迫使模型区分相同问题下的模式差异以避免学习虚假模式，与在未扰动数据上训练的模型相比，平均表现出显著的鲁棒性提升。该基准为理解模型行为提供了宝贵见解，并为设计能在动态现实环境中持续发展的系统指明了前进方向。