Boundary value problems involving elliptic PDEs such as the Laplace and the Helmholtz equations are ubiquitous in physics and engineering. Many such problems have alternative formulations as integral equations that are mathematically more tractable than their PDE counterparts. However, the integral equation formulation poses a challenge in solving the dense linear systems that arise upon discretization. In cases where iterative methods converge rapidly, existing methods that draw on fast summation schemes such as the Fast Multipole Method are highly efficient and well established. More recently, linear complexity direct solvers that sidestep convergence issues by directly computing an invertible factorization have been developed. However, storage and compute costs are high, which limits their ability to solve large-scale problems in practice. In this work, we introduce a distributed-memory parallel algorithm based on an existing direct solver named ``strong recursive skeletonization factorization.'' The analysis of its parallel scalability applies generally to a class of existing methods that exploit the so-called strong admissibility. Specifically, we apply low-rank compression to certain off-diagonal matrix blocks in a way that minimizes data movement. Given a compression tolerance, our method constructs an approximate factorization of a discretized integral operator (dense matrix), which can be used to solve linear systems efficiently in parallel. Compared to iterative algorithms, our method is particularly suitable for problems involving ill-conditioned matrices or multiple right-hand sides. Large-scale numerical experiments are presented to demonstrate the performance of our implementation using the Julia language.
翻译:涉及椭圆型偏微分方程(如拉普拉斯方程和亥姆霍兹方程)的边值问题在物理学和工程领域普遍存在。许多此类问题可转化为积分方程形式,它在数学上比对应的PDE形式更易处理。然而,积分方程公式在求解离散化后产生的稠密线性系统时面临挑战。在迭代方法快速收敛的情况下,现有方法(如采用快速多极子算法等快速求和方案)已具有高效性和成熟性。近期,研究者开发了通过直接计算可逆分解来规避收敛问题的线性复杂度直接求解器。但这类方法存储和计算成本较高,限制了其实际解决大规模问题的能力。本文提出一种基于现有直接求解器"强递归骨架化分解"的分布式内存并行算法。其并行可扩展性分析可推广至利用所谓强可接受性的一类现有方法。具体而言,我们通过最小化数据移动的方式,对特定非对角矩阵块进行低秩压缩。给定压缩容差后,该方法可构建离散化积分算子(稠密矩阵)的近似分解,从而高效并行求解线性系统。与迭代算法相比,本方法特别适用于病态矩阵或多右端项问题。通过大型数值实验展示了采用Julia语言实现的性能表现。