Query rewrite is essential for optimizing SQL queries to improve their execution efficiency without changing their results. Traditionally, this task has been tackled through heuristic and learning-based methods, each with its limitations in terms of inferior quality and low robustness. Recent advancements in LLMs offer a new paradigm by leveraging their superior natural language and code comprehension abilities. Despite their potential, directly applying LLMs like GPT-4 has faced challenges due to problems such as hallucinations, where the model might generate inaccurate or irrelevant results. To address this, we propose R-Bot, an LLM-based query rewrite system with a systematic approach. We first design a multi-source rewrite evidence preparation pipeline to generate query rewrite evidences for guiding LLMs to avoid hallucinations. We then propose a hybrid structure-semantics retrieval method that combines structural and semantic analysis to retrieve the most relevant rewrite evidences for effectively answering an online query. We next propose a step-by-step LLM rewrite method that iteratively leverages the retrieved evidences to select and arrange rewrite rules with self-reflection. We conduct comprehensive experiments on widely used benchmarks, and demonstrate the superior performance of our system, R-Bot, surpassing state-of-the-art query rewrite methods.
翻译:查询重写对于优化SQL查询以提升其执行效率至关重要,且不改变查询结果。传统上,该任务主要通过启发式方法和基于学习的方法来解决,但各自在质量不佳和鲁棒性较低方面存在局限。大语言模型(LLM)的最新进展提供了一种新范式,其利用卓越的自然语言与代码理解能力展现出潜力。然而,直接应用如GPT-4等LLM仍面临挑战,主要问题包括幻觉现象,即模型可能生成不准确或不相关的结果。为此,我们提出R-Bot,一种采用系统化方法的基于LLM的查询重写系统。我们首先设计了一个多源重写证据准备流程,用于生成查询重写证据以引导LLM避免幻觉。接着,我们提出一种混合结构-语义检索方法,结合结构与语义分析来检索最相关的重写证据,以有效响应在线查询。随后,我们提出一种分步式LLM重写方法,通过自反思机制迭代利用检索到的证据来筛选和排列重写规则。我们在广泛使用的基准测试上进行了全面实验,结果表明我们的系统R-Bot性能优异,超越了当前最先进的查询重写方法。