The paper describes a sparse direct solver for the linear systems that arise from the discretization of an elliptic PDE on a two dimensional domain. The solver is designed to reduce communication costs and perform well on GPUs; it uses a two-level framework, which is easier to implement and optimize than traditional multi-frontal schemes based on hierarchical nested dissection orderings. The scheme decomposes the domain into thin subdomains, or "slabs". Within each slab, a local factorization is executed that exploits the geometry of the local domain. A global factorization is then obtained through the LU factorization of a block-tridiagonal reduced coefficient matrix. The solver has complexity $O(N^{5/3})$ for the factorization step, and $O(N^{7/6})$ for each solve once the factorization is completed. The solver described is compatible with a range of different local discretizations, and numerical experiments demonstrate its performance for regular discretizations of rectangular and curved geometries. The technique becomes particularly efficient when combined with very high-order convergent multi-domain spectral collocation schemes. With this discretization, a Helmholtz problem on a domain of size $1000 \lambda \times 1000 \lambda$ (for which $N=100 \mbox{M}$) is solved in 15 minutes to 6 correct digits on a high-powered desktop with GPU acceleration.
翻译:本文描述了一种针对二维域上椭圆型偏微分方程离散化所得线性系统的稀疏直接求解器。该求解器旨在降低通信成本并在GPU上实现优异性能;它采用两层框架,相较于基于分层嵌套剖分序的传统多波前方案更易于实现和优化。该方案将计算域分解为薄子域(即"条带")。在每个条带内部,利用局部域几何特性执行局部分解。随后,通过对块三对角缩减系数矩阵进行LU分解获得全局分解。求解器的分解步骤复杂度为$O(N^{5/3})$,分解完成后每次求解复杂度为$O(N^{7/6})$。所述求解器兼容多种局部离散化方法,数值实验展示了其在矩形和弯曲几何域规则离散化中的表现。当与极高阶收敛的多域谱配置方案结合时,该技术尤为高效。采用该离散化方法,在配备GPU加速的高性能台式机上,针对$1000 \lambda \times 1000 \lambda$(对应$N=100 \mathrm{M}$)的亥姆霍兹问题,可在15分钟内求解至6位有效数字精度。