The singular value decomposition (SVD) is a powerful tool in modern numerical linear algebra, which underpins computational methods such as principal component analysis (PCA), low-rank approximations, and randomized algorithms. Many practical scenarios require solving numerous small SVD problems, a regime generally referred to as "batch SVD". Existing programming models can handle this efficiently on parallel CPU architectures, but high-performance solutions for GPUs remain immature. A GPU-oriented batch SVD solver is introduced. This solver exploits the one-sided Jacobi algorithm to exploit fine-grained parallelism, and a number of algorithmic and design optimizations achieve unmatched performance. Starting from a baseline solver, a sequence of optimizations is applied to obtain incremental performance gains. Numerical experiments show that the new solver is robust across problems with different numerical properties, matrix shapes, and arithmetic precisions. Performance benchmarks on both NVIDIA and AMD systems show significant performance speedups over vendor solutions as well as existing open-source solvers.
翻译:奇异值分解(SVD)是现代数值线性代数中的强大工具,它支撑着主成分分析(PCA)、低秩逼近和随机化算法等计算方法。诸多实际场景需要求解大量小规模SVD问题,这一范畴通常被称为"批量SVD"。现有的编程模型可在并行CPU架构上高效处理此类问题,但面向GPU的高性能解决方案仍不成熟。本文提出了一种面向GPU的批量SVD求解器。该求解器利用单边雅可比算法挖掘细粒度并行性,并通过一系列算法与设计优化实现了无与伦比的性能。以基线求解器为起点,逐步应用优化序列以获得递增的性能增益。数值实验表明,该新求解器在不同数值特性、矩阵形状和算术精度的问题中均表现稳健。在NVIDIA和AMD系统上的性能基准测试显示,与供应商解决方案及现有开源求解器相比,该求解器实现了显著的性能加速。