We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures. The integration domain is recursively partitioned into subdomains whose refinement is guided by local error estimators. Each subdomain evolves independently on a GPU, which exposes a significant load imbalance as the adaptive process progresses. To address this challenge, we introduce a decentralised load redistribution schemes based on a cyclic round-robin policy. This strategy dynamically rebalance subdomains across devices through non-blocking, CUDA-aware MPI communication that overlaps with computation. The proposed strategy has two main advantages compared to a state-of-the-art GPU-tailored package: higher efficiency in high dimensions; and improved robustness w.r.t the integrand regularity and the target accuracy.
翻译:本文提出了一种分布式自适应数值积分方法,将多维积分问题建模为多GPU架构下的层次化区域分解问题。积分区域通过递归划分形成子区域,其细化过程由局部误差估计器指导。每个子区域在单个GPU上独立演化,随着自适应过程的推进,会暴露出显著的负载不均衡问题。为应对这一挑战,我们引入了一种基于循环轮询策略的分散式负载重分配机制。该策略通过非阻塞、支持CUDA感知的MPI通信,在计算过程中动态地跨设备重新平衡子区域分布。与当前先进的GPU定制化软件包相比,所提策略具有两大优势:在高维情况下具有更高计算效率;以及对被积函数正则性与目标精度的更强鲁棒性。