This article introduces Figaro, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. Figaro's main novelty is that it pushes the QR decomposition past the join. This leads to several desirable properties. For acyclic joins, it takes time linear in the database size and independent of the join size. Its execution is equivalent to the application of a sequence of Givens rotations proportional to the join size. Its number of rounding errors relative to the classical QR decomposition algorithms is on par with the database size relative to the join output size. The QR decomposition lies at the core of many linear algebra computations including the singular value decomposition (SVD) and the principal component analysis (PCA). We show how Figaro can be used to compute the orthogonal matrix in the QR decomposition, the SVD and the PCA of the join output without the need to materialize the join output. A suite of experiments validate that Figaro can outperform both in runtime performance and numerical accuracy the LAPACK library Intel MKL by a factor proportional to the gap between the sizes of the join output and input.
翻译:本文介绍Figaro算法,该算法用于计算关系数据自然连接所定义矩阵的QR分解中的上三角矩阵。Figaro的主要创新在于将QR分解推入连接操作内部执行,从而带来若干理想特性:对于无环连接,其时间复杂度与数据库规模呈线性关系且与连接结果规模无关;其执行过程等价于应用一系列与连接规模成比例的Givens旋转;相较于经典QR分解算法,其舍入误差数量与数据库规模相对于连接输出规模的比例相当。QR分解是众多线性代数计算(包括奇异值分解(SVD)和主成分分析(PCA))的核心。我们展示了如何利用Figaro计算QR分解中的正交矩阵,以及连接输出的SVD和PCA,而无需物化连接结果。系列实验验证,Figaro在运行时性能和数值精度方面均优于LAPACK库Intel MKL,其优势倍数与连接输出与输入规模之比成正比。