Customer demand, regulatory pressure, and engineering efficiency are the driving forces behind the industry-wide trend of moving from siloed engines and services that are optimized in isolation to highly integrated solutions. This is confirmed by the wide adoption of open formats, shared component libraries, and the meteoric success of integrated data lake experiences such as Microsoft Fabric. In this paper, we study the implications of this trend to Query Optimizer (QO) and discuss our experience of building Calcite and extending Cascades into QO components of Microsoft SQL Server, Fabric Data Warehouse (DW), and SCOPE. We weigh the pros and cons of a drastic change in direction: moving from bespoke QOs or library-sharing (\`a la Calcite) to rewriting the QO stack and fully embracing Query Optimizer as a Service (QOaaS). We report on some early successes and stumbles as we explore these ideas with prototypes compatible with Fabric DW and Spark. The benefits include centralized workload-level optimizations, multi-engine federation, and accelerated feature creation, but the challenges are equally daunting. We plan to engage CIDR audience in a debate on this exciting topic.
翻译:客户需求、监管压力与工程效率正推动全行业从孤立优化的分散引擎与服务,转向高度集成的解决方案。开放格式的广泛采用、共享组件库的普及,以及微软Fabric等集成数据湖方案的迅猛成功,均印证了这一趋势。本文探讨这一趋势对查询优化器(QO)的影响,并分享我们在构建Calcite及将Cascades框架扩展为微软SQL Server、Fabric数据仓库(DW)和SCOPE系统QO组件的实践经验。我们深入分析了从定制化QO或库共享模式(如Calcite方案)转向重写QO技术栈、全面拥抱查询优化器即服务(QOaaS)这一重大方向变革的利弊。通过兼容Fabric DW与Spark的原型系统实验,我们汇报了探索过程中的初步成果与挫折。该方案可带来集中式工作负载级优化、多引擎联邦与加速功能开发等优势,但面临的挑战同样艰巨。我们期待在CIDR会议上就此议题与学界展开深入探讨。