Boundary value problems involving elliptic PDEs such as the Laplace and the Helmholtz equations are ubiquitous in mathematical physics and engineering. Many such problems can be alternatively formulated as integral equations that are mathematically more tractable. However, an integral-equation formulation poses a significant computational challenge: solving large dense linear systems that arise upon discretization. In cases where iterative methods converge rapidly, existing methods that draw on fast summation schemes such as the Fast Multipole Method are highly efficient and well-established. More recently, linear complexity direct solvers that sidestep convergence issues by directly computing an invertible factorization have been developed. However, storage and computation costs are high, which limits their ability to solve large-scale problems in practice. In this work, we introduce a distributed-memory parallel algorithm based on an existing direct solver named ``strong recursive skeletonization factorization.'' Specifically, we apply low-rank compression to certain off-diagonal matrix blocks in a way that minimizes computation and data movement. Compared to iterative algorithms, our method is particularly suitable for problems involving ill-conditioned matrices or multiple right-hand sides. Large-scale numerical experiments are presented to show the performance of our Julia implementation.
翻译:涉及椭圆型偏微分方程(如拉普拉斯方程和亥姆霍兹方程)的边值问题在数学物理和工程领域广泛应用。其中许多问题可转化为数学上更易处理的积分方程。然而,积分方程公式化会带来显著的计算挑战:求解离散化产生的大型稠密线性系统。在迭代方法快速收敛的情况下,现有基于快速求和方案(如快速多极子方法)的方法效率高且成熟。近年来,已发展出线性复杂度的直接求解器,通过直接计算可逆分解来避免收敛问题。但其存储和计算成本较高,限制了实际大规模问题的求解能力。本文提出一种基于现有直接求解器“强递归骨架化分解”的分布式内存并行算法。具体而言,我们对特定非对角矩阵块应用低秩压缩,以最小化计算和数据移动。相较于迭代算法,本方法尤其适用于涉及病态矩阵或多右端项的问题。通过大规模数值实验展示了我们Julia实现的性能。