Standard rank-revealing factorizations such as the singular value decomposition and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level-3 BLAS. This paper presents two alternative algorithms for computing a rank-revealing factorization of the form $A = U T V^*$, where $U$ and $V$ are orthogonal and $T$ is triangular. Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix-matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve an order of magnitude acceleration over finely tuned GPU implementations of the SVD while providing low-rank approximation errors close to that of the SVD.
翻译:标准秩揭示分解(如奇异值分解和列主元QR分解)在GPU上高效实现颇具挑战。主要困难在于标准算法难以将大部分运算转化为Level-3 BLAS操作。本文提出两种计算形如$A = U T V^*$的秩揭示分解的替代算法(其中$U$和$V$为正交矩阵,$T$为三角矩阵)。两种算法均采用随机投影技术,将大部分浮点运算转化为矩阵乘法运算,这在GPU上具有极高效率。数值实验表明,与精细调优的GPU实现奇异值分解相比,这些算法在实现接近SVD的低秩近似误差的同时,可获得数量级的加速。