Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstraction boundaries, even in highly optimized systems. An engine specialized to a fixed workload can eliminate these costs and exploit workload-specific data structures and execution algorithms for substantially higher performance. Historically, constructing such bespoke engines has been economically impractical due to the high manual engineering effort. Recent advances in LLM-based code synthesis challenge this tradeoff by enabling automated system generation. However, naively prompting an LLM to produce a database engine does not yield a correct or efficient design, as effective synthesis requires systematic performance feedback, structured refinement, and careful management of deep architectural interdependencies. We present Bespoke OLAP, a fully autonomous synthesis pipeline for constructing high-performance database engines tightly tailored to a given workload. Our approach integrates iterative performance evaluation and automated validation to guide synthesis from storage to query execution. We demonstrate that Bespoke OLAP can generate a workload-specific engine from scratch within minutes to hours, achieving order-of-magnitude speedups over modern general-purpose systems such as DuckDB.
翻译:现代OLAP引擎旨在支持任意的分析型工作负载,但这种通用性会带来结构性开销,包括运行时模式解析、间接层和抽象边界,即使在高度优化的系统中也不例外。为固定工作负载专门设计的引擎能够消除这些成本,并利用面向特定工作负载的数据结构和执行算法,从而获得显著更高的性能。历史上,由于高昂的人工工程成本,构建此类定制化引擎在经济上并不现实。近期基于LLM的代码合成技术通过实现自动化系统生成,对这一权衡关系提出了挑战。然而,简单地提示LLM生成数据库引擎并不能产生正确或高效的设计,因为有效的合成需要系统化的性能反馈、结构化的优化以及对深层架构相互依赖关系的精细管理。本文提出Bespoke OLAP——一个完全自主的合成流水线,用于构建与给定工作负载紧密匹配的高性能数据库引擎。我们的方法集成了迭代性能评估与自动化验证,以指导从存储到查询执行的整个合成过程。实验证明,Bespoke OLAP能够在数分钟至数小时内从零开始生成面向特定工作负载的引擎,相较于DuckDB等现代通用系统实现了数量级的性能提升。