The singular value decomposition (SVD) is a powerful tool in modern numerical linear algebra, which underpins computational methods such as principal component analysis (PCA), low-rank approximations, and randomized algorithms. Many practical scenarios require solving numerous small SVD problems, a regime generally referred to as "batch SVD". Existing programming models can handle this efficiently on parallel CPU architectures, but high-performance solutions for GPUs remain immature. A GPU-oriented batch SVD solver is introduced. This solver exploits the one-sided Jacobi algorithm to exploit fine-grained parallelism, and a number of algorithmic and design optimizations achieve unmatched performance. Starting from a baseline solver, a sequence of optimizations is applied to obtain incremental performance gains. Numerical experiments show that the new solver is robust across problems with different numerical properties, matrix shapes, and arithmetic precisions. Performance benchmarks on both NVIDIA and AMD systems show significant performance speedups over vendor solutions as well as existing open-source solvers.
翻译:奇异值分解(SVD)是现代数值线性代数中的强大工具,是主成分分析(PCA)、低秩近似和随机算法等计算方法的基础。许多实际场景需要求解大量小型SVD问题,这一领域通常被称为"批量SVD"。现有编程模型可以在并行CPU架构上高效处理此问题,但面向GPU的高性能解决方案仍不成熟。本文介绍了一种面向GPU的批量SVD求解器。该求解器利用单边Jacobi算法开发细粒度并行性,并通过一系列算法和设计优化实现了无与伦比的性能。从基线求解器出发,应用一系列优化以获得渐进式性能提升。数值实验表明,新求解器对于具有不同数值特性、矩阵形状和算术精度的问题均具有鲁棒性。在NVIDIA和AMD系统上的性能基准测试显示,相较于供应商解决方案以及现有的开源求解器,新求解器实现了显著的性能加速。