The locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm is a popular approach for computing a few smallest eigenvalues and the corresponding eigenvectors of a large Hermitian positive definite matrix A. In this work, we propose a mixed precision variant of LOBPCG that uses a (sparse) Cholesky factorization of A computed in reduced precision as the preconditioner. To further enhance performance, a mixed precision orthogonalization strategy is proposed. To analyze the impact of reducing precision in the preconditioner on performance, we carry out a rounding error and convergence analysis of PINVIT, a simplified variant of LOBPCG. Our theoretical results predict and our numerical experiments confirm that the impact on convergence remains marginal. In practice, our mixed precision LOBPCG algorithm typically reduces the computation time by a factor of 1.4--2.0 on both CPUs and GPUs.
翻译:局部最优块预条件共轭梯度(LOBPCG)算法是计算大型埃尔米特正定矩阵A的若干最小特征值及其对应特征向量的常用方法。本文提出了一种LOBPCG的混合精度变体,该变体使用以降低精度计算得到的A的(稀疏)乔列斯基分解作为预条件子。为进一步提升性能,还提出了一种混合精度正交化策略。为分析预条件子精度降低对性能的影响,我们对LOBPCG的简化变体PINVIT进行了舍入误差与收敛性分析。理论结果预测且数值实验证实,该影响对收敛性的影响微乎其微。在实际应用中,我们的混合精度LOBPCG算法在CPU和GPU上通常能将计算时间减少1.4至2.0倍。