The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally requires some insight into the sparse linear system and the desired trade-off of generating the preconditioner and the reduction in the number of iterations. Incomplete factorization methods tend to be black box methods to generate these preconditioners but may fail for a number of reasons. These reasons include numerical issues that require searching for adequate scaling, shifting, and fill-in while utilizing a difficult to parallelize algorithm. With a move towards heterogeneous computing, many sparse applications find GPUs that are optimized for dense tensor applications like training neural networks being underutilized. In this work, we demonstrate that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner. This generated preconditioner is as good or better in terms of reduction of iterations than the one found using multiple preconditioning techniques such as scaling and shifting. Moreover, the generated method also works and never fails to produce a preconditioner that does not reduce the iteration count.
翻译:稀疏线性方程组的求解在科学应用中无处不在。受限于内存与计算复杂度,迭代方法(如预条件共轭梯度法,PCG)通常优于直接求解法。然而,这些方法的效率取决于所采用的预条件子。预条件子的开发通常需要深入了解稀疏线性系统,并权衡预条件子生成与迭代次数缩减之间的关系。不完全分解方法虽可作为生成预条件子的黑盒方法,但因多种原因可能失效。这些原因包括数值问题,需要搜索合适的缩放、平移及填充因子,同时需采用难以并行化的算法。随着异构计算的兴起,许多稀疏应用发现,专为密集张量运算(如神经网络训练)优化的GPU利用率不足。本研究表明,一个简单的、在编译时或与GPU上运行的应用程序并行训练的人工神经网络,能够提供可用作预条件子的不完全稀疏乔列斯基分解。所生成的预条件子,在迭代次数缩减方面,与使用缩放和平移等多种预条件技术获得的预条件子效果相当或更优。此外,该方法始终有效,且从未产生无法减少迭代次数的预条件子。