Judging the equivalence between two SQL queries is a fundamental problem with many practical applications in data management and SQL generation (i.e., evaluating the quality of generated SQL queries in text-to-SQL task). While the research community has reasoned about SQL equivalence for decades, it poses considerable difficulties and no complete solutions exist. Recently, Large Language Models (LLMs) have shown strong reasoning capability in conversation, question answering and solving mathematics challenges. In this paper, we study if LLMs can be used to determine the equivalence between SQL queries under two notions of SQL equivalence (semantic equivalence and relaxed equivalence). To assist LLMs in generating high quality responses, we present two prompting techniques: Miniature & Mull and Explain & Compare. The former technique is used to evaluate the semantic equivalence in which it asks LLMs to execute a query on a simple database instance and then explore if a counterexample exists by modifying the database. The latter technique is used to evaluate the relaxed equivalence in which it asks LLMs to explain the queries and then compare if they contain significant logical differences. Our experiments demonstrate using our techniques, LLMs is a promising tool to help data engineers in writing semantically equivalent SQL queries, however challenges still persist, and is a better metric for evaluating SQL generation than the popular execution accuracy.
翻译:判断两个SQL查询是否等价是一个基础性问题,在数据管理和SQL生成(例如,评估文本到SQL任务中生成SQL查询的质量)中具有诸多实际应用。尽管研究界对SQL等价性已探讨数十年,但它仍带来相当大的困难,且不存在完整的解决方案。近年来,大语言模型在对话、问答和解决数学挑战方面展现出强大的推理能力。本文研究在大语言模型能否用于判定两种SQL等价性概念(语义等价与宽松等价)下的SQL查询等价。为辅助大语言模型生成高质量响应,我们提出了两种提示技术:微型实例与多轮推演,以及解释与比较。前者用于评估语义等价性,其要求大语言模型在简单数据库实例上执行查询,随后通过修改数据库来探索是否存在反例。后者用于评估宽松等价性,其要求大语言模型解释查询并比较它们是否包含显著的逻辑差异。我们的实验表明,采用我们的技术,大语言模型是帮助数据工程师编写语义等价SQL查询的有前景的工具,然而挑战依然存在,并且相较于流行的执行准确率,它是评估SQL生成的更优指标。