There is currently a significant gap between the performance of fine-tuned models and prompting approaches using Large Language Models (LLMs) on the challenging task of text-to-SQL, as evaluated on datasets such as Spider. To improve the performance of LLMs in the reasoning process, we study how decomposing the task into smaller sub-tasks can be effective. In particular, we show that breaking down the generation problem into sub-problems and feeding the solutions of those sub-problems into LLMs can be an effective approach for significantly improving their performance. Our experiments with three LLMs show that this approach consistently improves their simple few-shot performance by roughly 10%, pushing the accuracy of LLMs towards SOTA or surpassing it. On the holdout test set of Spider, the SOTA, in terms of execution accuracy, was 79.9 and the new SOTA at the time of this writing using our approach is 85.3. Our approach with in-context learning beats many heavily fine-tuned models by at least 5%. Additionally, when evaluated on the BIRD benchmark, our approach achieved an execution accuracy of 55.9%, setting a new SOTA on its holdout test set.
翻译:当前,在如Spider等数据集评估的文本到SQL这一具有挑战性的任务上,微调模型与基于大语言模型(LLMs)的提示方法之间存在显著性能差距。为提升LLMs在推理过程中的表现,我们研究了如何通过将任务分解为更小的子任务来实现有效性。具体而言,我们发现将生成问题拆解为子问题并将这些子问题的解决方案输入LLMs,可以作为一种有效策略来显著提升其性能。我们在三个LLM上的实验表明,该方法将其简单的少样本性能持续提升了约10%,使LLMs的准确率接近或超越当前最优水平(SOTA)。在Spider的保留测试集上,SOTA的执行准确率为79.9,而采用我们的方法后,最新SOTA达到85.3。基于上下文学习的方法比许多经过大量微调的模型至少高出5%的准确率。此外,在BIRD基准测试中,我们的方法实现了55.9%的执行准确率,在其保留测试集上创下了新的SOTA纪录。