Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries that interleave relational operators with array manipulations remains limited. This paper introduces A3D-RA, an extended relational algebra supporting array-valued attributes, together with a comprehensive framework for algebraic reasoning and optimization. We formalize its data model and semantics, develop a complete set of equivalence-preserving transformation rules capturing pairwise interactions between relational and array operators, and propose a plan enumeration strategy with an optimality guarantee that remains polynomial in all non-join operators. We design A3D-RA as a modular, backend-independent optimization layer that can be instantiated over existing analytical database systems. Experimental results across three high-performance engines on a real-world workload show consistent performance gains enabled by the proposed algebraic optimization layer.
翻译:现代分析工作负载日益融合关系数据与数组值属性。尽管列式数据库系统能够高效处理此类负载,但其在优化交织关系运算符与数组操作符的查询方面仍存在局限。本文提出A3D-RA(一种支持数组值属性的扩展关系代数)及一套完整的代数推理与优化框架。我们对其数据模型与语义进行形式化定义,开发了捕捉关系运算符与数组操作符间两两交互的完整等价保持变换规则集,并提出一种在非连接运算符上保持多项式复杂度的最优性保证计划枚举策略。我们将A3D-RA设计为模块化、后端无关的优化层,可实例化于现有分析型数据库系统之上。在三个高性能引擎上基于真实工作负载的实验结果表明,所提出的代数优化层能够持续带来性能提升。