Judging the equivalence between two SQL queries is a fundamental problem with many practical applications in data management and SQL generation (i.e., evaluating the quality of generated SQL queries in text-to-SQL task). While the research community has reasoned about SQL equivalence for decades, it poses considerable difficulties and no complete solutions exist. Recently, Large Language Models (LLMs) have shown strong reasoning capability in conversation, question answering and solving mathematics challenges. In this paper, we study if LLMs can be used to determine the equivalence between SQL queries under two notions of SQL equivalence (semantic equivalence and relaxed equivalence). To assist LLMs in generating high quality responses, we present two prompting techniques: Miniature & Mull and Explain & Compare. The former technique is used to evaluate the semantic equivalence in which it asks LLMs to execute a query on a simple database instance and then explore if a counterexample exists by modifying the database. The latter technique is used to evaluate the relaxed equivalence in which it asks LLMs to explain the queries and then compare if they contain significant logical differences. Our experiments demonstrate using our techniques, LLMs is a promising tool to help data engineers in writing semantically equivalent SQL queries, however challenges still persist, and is a better metric for evaluating SQL generation than the popular execution accuracy.
翻译:判断两个SQL查询是否等价是一个基础性问题,在数据管理和SQL生成(例如,评估文本到SQL任务中生成SQL查询的质量)中具有诸多实际应用。尽管研究界对SQL等价性已探讨数十年,但其仍存在相当大的困难,且尚无完备的解决方案。近年来,大型语言模型(LLMs)在对话、问答和解决数学挑战方面展现出强大的推理能力。本文旨在研究LLMs能否用于判定两种SQL等价性概念(语义等价与宽松等价)下的SQL查询等价性。为辅助LLMs生成高质量响应,我们提出了两种提示技术:微型数据库探索与解释比较法。前者用于评估语义等价性,其要求LLMs在简易数据库实例上执行查询,随后通过修改数据库来探索是否存在反例。后者用于评估宽松等价性,其要求LLMs解释查询语句,进而比较它们是否包含显著的逻辑差异。实验结果表明,采用我们的技术时,LLMs能成为协助数据工程师编写语义等价SQL查询的有力工具,尽管挑战依然存在;同时,该方法相较于当前流行的执行准确率,能成为更优的SQL生成评估指标。