Chebyshev Filtered Subspace Iteration (ChFSI) has been widely adopted for computing a small subset of extreme eigenvalues in large sparse matrices. This work introduces a residual-based reformulation of ChFSI, referred to as R-ChFSI, designed to accommodate inexact matrix-vector products while maintaining robust convergence properties. By reformulating the traditional Chebyshev recurrence to operate on residuals rather than eigenvector estimates, the R-ChFSI approach effectively suppresses the errors made in matrix-vector products, improving the convergence behaviour for both standard and generalized eigenproblems. This ability of R-ChFSI to be tolerant to inexact matrix-vector products allows one to incorporate approximate inverses for large-scale generalized eigenproblems, making the method particularly attractive where exact matrix factorizations or iterative methods become computationally expensive for evaluating inverses. It also allows us to compute the matrix-vector products in lower-precision arithmetic allowing us to leverage modern hardware accelerators. Through extensive benchmarking, we demonstrate that R-ChFSI achieves desired residual tolerances while leveraging low-precision arithmetic. For problems with millions of degrees of freedom and thousands of eigenvalues, R-ChFSI attains final residual norms in the range of 10$^{-12}$ to 10$^{-14}$, even with FP32 and TF32 arithmetic, significantly outperforming standard ChFSI in similar settings. In generalized eigenproblems, where approximate inverses are used, R-ChFSI achieves residual tolerances up to ten orders of magnitude lower, demonstrating its robustness to approximation errors. Finally, R-ChFSI provides a scalable and computationally efficient alternative for solving large-scale eigenproblems in high-performance computing environments.
翻译:切比雪夫滤波子空间迭代(ChFSI)已被广泛用于计算大型稀疏矩阵的少量极端特征值。本文提出了一种基于残差的ChFSI重构方法,称为R-ChFSI,该方法旨在适应非精确的矩阵向量积,同时保持稳健的收敛特性。通过将传统的切比雪夫递推重构为对残差而非特征向量估计进行操作,R-ChFSI方法有效抑制了矩阵向量积中产生的误差,改善了标准及广义特征值问题的收敛行为。R-ChFSI对非精确矩阵向量积的容忍能力,使得在求解大规模广义特征值问题时可以引入近似逆矩阵,这在该方法中尤其具有吸引力——因为当精确矩阵分解或迭代法在计算逆矩阵时变得计算代价高昂时,此方法仍可适用。该方法还允许我们在低精度算术下计算矩阵向量积,从而能够充分利用现代硬件加速器。通过大量基准测试,我们证明R-ChFSI在利用低精度算术的同时能够达到所需的残差容限。对于具有数百万自由度和数千个特征值的问题,即使使用FP32和TF32算术,R-ChFSI也能获得10$^{-12}$至10$^{-14}$范围内的最终残差范数,在类似设置下显著优于标准ChFSI。在使用近似逆矩阵的广义特征值问题中,R-ChFSI实现了高达十个数量级更低的残差容限,证明了其对近似误差的鲁棒性。最后,R-ChFSI为高性能计算环境中求解大规模特征值问题提供了一种可扩展且计算高效的替代方案。