The Multilevel Fast Multipole Algorithm (MLFMA) has known applications in scientific modeling in the fields of telecommunications, physics, mechanics, and chemistry. Accelerating calculation of far-field using GPUs and GPU clusters for large-scale problems has been studied for more than a decade. The acceleration of the Near Field Computation (P2P operator) however was less of a concern because it does not face the challenges of distributed processing which does far field. This article proposes a modification of the P2P algorithm and uses performance models to determine its optimality criteria. By modeling the speedup, we found that making threads independence by creating redundancy in the data makes the algorithm for lower dense (higher frequency) problems nearly 13 times faster than non-redundant mode.
翻译:多层快速多极算法在电信、物理、力学和化学等领域的科学建模中具有已知应用。利用GPU和GPU集群加速大规模问题的远场计算已研究超过十年。然而,近场计算(P2P算子)的加速此前较少受到关注,因其不面临远场所需的分布式处理挑战。本文提出对P2P算法的改进,并使用性能模型确定其最优性准则。通过对加速比建模,我们发现通过创建数据冗余实现线程独立性,可使较低密度(较高频率)问题的算法速度比非冗余模式提升近13倍。