Coarsening and parallelism with reduction multigrids for hyperbolic Boltzmann transport

Reduction multigrids have recently shown good performance in hyperbolic problems without the need for Gauss-Seidel smoothers. When applied to the hyperbolic limit of the Boltzmann Transport Equation (BTE), these methods result in very close to $\mathcal{O}(n)$ growth in work with problem size on unstructured grids. This scalability relies on the CF splitting producing an $A_\textrm{ff}$ block that is easy to invert. We introduce a parallel two-pass CF splitting designed to give diagonally dominant $A_\textrm{ff}$. The first pass computes a maximal independent set in the symmetrized strong connections. The second pass converts F-points to C-points based on the row-wise diagonal dominance of $A_\textrm{ff}$. We find this two-pass CF splitting outperforms common CF splittings available in hypre. Furthermore, parallelisation of reduction multigrids in hyperbolic problems is difficult as we require both long-range grid-transfer operators and slow coarsenings (with rates of $\sim$1/2 in both 2D and 3D). We find that good parallel performance in the setup and solve is dependent on several factors: repartitioning the coarse grids, reducing the number of active MPI ranks as we coarsen, truncating the multigrid hierarchy and applying a GMRES polynomial as a coarse-grid solver. We compare the performance of two different reduction multigrids, AIRG (that we developed previously) and the hypre implementation of $\ell$AIR. In the streaming limit with AIRG, we demonstrate 81\% weak scaling efficiency in the solve from 2 to 64 nodes (256 to 8196 cores) with only 8.8k unknowns per core, with solve times up to 5.9$\times$ smaller than the $\ell$AIR implementation in hypre.

翻译：约简多重网格法最近在双曲型问题中展现出良好性能，且无需高斯-赛德尔光滑子。当应用于玻尔兹曼输运方程（BTE）的双曲极限时，这些方法在非结构网格上实现了接近$\mathcal{O}(n)$级别的工作量随问题规模增长。这种可扩展性依赖于CF分裂产生的$A_\textrm{ff}$块易于求逆。我们提出了一种并行双通道CF分裂方法，旨在生成对角占优的$A_\textrm{ff}$。第一通道计算对称化强连接中的最大独立集；第二通道基于$A_\textrm{ff}$的行对角占优性将F点转换为C点。实验表明该双通道CF分裂优于hypre中常见的CF分裂方法。此外，双曲问题中约简多重网格的并行化具有挑战性，因为需要长程网格传递算子和缓慢的粗化过程（二维和三维的粗化率均约为1/2）。研究发现，建立和求解阶段的良好并行性能取决于多个因素：对粗网格进行重新分区、随粗化进程减少活跃MPI进程数量、截断多重网格层次结构以及采用GMRES多项式作为粗网格求解器。我们比较了两种约简多重网格方法——先前开发的AIRG与hypre中$\ell$AIR实现的性能。在AIRG的流极限测试中，从2节点到64节点（256至8196核）的求解过程实现了81%的弱扩展效率，每核仅需处理8.8k未知量，求解时间比hypre中的$\ell$AIR实现最多缩短5.9倍。