We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 39% of the peak performance on Nvidia A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and MPI parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.
翻译:我们提出了一种适用于高阶间断伽辽金有限元方法的无矩阵多重网格方法,并采用GPU加速。本文进行了性能分析,比较了多种数据与计算布局方案。通过局部化与快速对角化技术优化了平滑器实现。利用共享内存中的无冲突访问模式,在Nvidia A100 GPU上实现了高达峰值性能39%的算术吞吐量。实验结果验证了混合精度方法与MPI并行化在算法加速中的有效性。此外,针对泊松问题,在二维和三维场景下对求解器的效率与鲁棒性进行了系统评估。