In recent years, more people have seen their work depend on data manipulation tasks. However, many of these users do not have the background in programming required to write complex programs, particularly SQL queries. One way of helping these users is automatically synthesizing the SQL query given a small set of examples. Several program synthesizers for SQL have been recently proposed, but they do not leverage multicore architectures. This paper proposes CUBES, a parallel program synthesizer for the domain of SQL queries using input-output examples. Since input-output examples are an under-specification of the desired SQL query, sometimes, the synthesized query does not match the user's intent. CUBES incorporates a new disambiguation procedure based on fuzzing techniques that interacts with the user and increases the confidence that the returned query matches the user intent. We perform an extensive evaluation on around 4000 SQL queries from different domains. Experimental results show that our sequential version can solve more instances than other state-of-the-art SQL synthesizers. Moreover, the parallel approach can scale up to 16 processes with super-linear speedups for many hard instances. Our disambiguation approach is critical to achieving an accuracy of around 60%, significantly larger than other SQL synthesizers.
翻译:近年来,越来越多的人依赖数据处理任务完成工作。然而,许多用户缺乏编写复杂程序(尤其是SQL查询)所需的编程背景。通过少量示例自动合成SQL查询是帮助这些用户的一种方法。近期已有多种针对SQL的程序合成器被提出,但它们未能充分利用多核架构。本文提出CUBES,一种面向输入-输出示例的SQL查询领域并行程序合成器。由于输入-输出示例是目标SQL查询的一种欠规范描述,有时合成的查询与用户意图不符。CUBES引入了一种基于模糊测试技术的新解歧流程,通过与用户交互来增强返回查询匹配用户意图的置信度。我们对来自不同领域的约4000个SQL查询进行了广泛评估。实验结果表明,我们的串行版本能比当前最先进的SQL合成器解决更多实例。此外,并行方法可扩展至16个进程,且对许多困难实例实现超线性加速。我们的解歧方法对实现约60%的准确率至关重要,该准确率显著高于其他SQL合成器。