An Efficient Randomized QLP Algorithm for Approximating the Singular Value Decomposition

In this paper, we introduce a randomized QLP decomposition called Rand-QLP. Operating on a matrix $\bf A$, Rand-QLP gives ${\bf A}={\bf QLP}^T$, where $\bf Q$ and $\bf P$ are orthonormal, and $\bf L$ is lower-triangular. Under the assumption that the rank of the input matrix is $k$, we derive several error bounds for Rand-QLP: bounds for the first $k$ approximate singular values and for the trailing block of the middle factor $\bf L$, which show that the decomposition is rank-revealing; bounds for the distance between approximate subspaces and the exact ones for all four fundamental subspaces of a given matrix; and bounds for the errors of low-rank approximations constructed by the columns of $\bf Q$ and $\bf P$. Rand-QLP is able to effectively leverage modern computational architectures, due to the utilization of random sampling and the unpivoted QR decomposition, thus addressing a serious bottleneck associated with classical algorithms such as the singular value decomposition (SVD), column-pivoted QR (CPQR) and most recent matrix decomposition algorithms. To assess the performance behavior of different algorithms, we use an Intel Xeon Gold 6240 CPU running at 2.6 GHz with a NVIDIA GeForce RTX 2080Ti GPU. In comparison to CPQR and the SVD, Rand-QLP respectively achieves a speedup of up to 5 times and 6.6 times on the CPU and up to 3.8 times and 4.4 times with the hybrid GPU architecture. In terms of quality of approximation, our results on synthetic and real data show that the approximations by Rand-QLP are comparable to those of pivoted QLP and the optimal SVD, and in most cases are considerably better than those of CPQR.

翻译：本文提出了一种名为Rand-QLP的随机QLP分解算法。该算法作用于矩阵$\bf A$，得到${\bf A}={\bf QLP}^T$，其中$\bf Q$和$\bf P$是正交矩阵，$\bf L$是下三角矩阵。在输入矩阵秩为$k$的假设下，我们推导了Rand-QLP的若干误差界：前$k$个近似奇异值的误差界以及中间因子$\bf L$尾部块的误差界，表明该分解具有秩揭示性质；对于给定矩阵的四个基本子空间，近似子空间与精确子空间之间距离的误差界；以及由$\bf Q$和$\bf P$的列构建的低秩近似的误差界。Rand-QLP通过利用随机采样和无主元QR分解，能够有效利用现代计算架构，从而解决了传统算法（如奇异值分解(SVD)、列主元QR分解(CPQR)及最新矩阵分解算法）中的严重瓶颈问题。为评估不同算法的性能表现，我们使用了运行在2.6 GHz的Intel Xeon Gold 6240 CPU和NVIDIA GeForce RTX 2080Ti GPU。与CPQR和SVD相比，Rand-QLP在CPU上分别实现了最高5倍和6.6倍的加速，在使用混合GPU架构时则分别达到3.8倍和4.4倍的加速。在近似质量方面，我们在合成数据和真实数据上的结果表明，Rand-QLP的近似结果与主元QLP及最优SVD相当，且在大多数情况下明显优于CPQR。