Parallel Spawning Strategies for Dynamic-Aware MPI Applications

Dynamic resource management is an increasingly important capability of High Performance Computing systems, as it enables jobs to adjust their resource allocation at runtime. This capability can reduce workload makespan, substantially decreasing job waiting times and optimizing resource allocation. In this context, malleability refers to the ability of applications to adapt to new resource allocations during execution. Although beneficial, malleability incurs significant reconfiguration costs, making the reduction of these costs an important research topic. Some existing solutions for MPI applications respawn the entire application, which is an expensive solution that avoids the reuse of original processes. Other MPI solutions reuse them, but fail to fully release unneeded processes when shrinking, since some ranks within the same communicator remain active across nodes, preventing the application from returning those nodes to the system. This work overcomes both limitations by proposing a novel parallel spawning strategy, in which all processes cooperate in the spawning. This allows expansions to reuse processes while also terminating unneeded ones. This strategy has been validated on two systems with either machines with equal or different numbers of cores. Experiments show that this strategy preserves competitive expansion times with at most a $1.13\times$ and $1.25\times$ overhead for equal and different number of cores per node, respectively. More importantly, it enables fast shrink operations that reduce their cost by at least $1387\times$ and $20\times$ in the same scenarios.

翻译：动态资源管理已成为高性能计算系统日益重要的能力，它使得作业能够在运行时调整其资源分配。该能力可缩短工作负载完成时间，显著减少作业等待时长并优化资源分配。在此背景下，可塑性指应用程序在执行过程中适应新资源分配的能力。尽管具有优势，可塑性会带来显著的重配置开销，因此降低此类开销成为重要的研究课题。现有针对MPI应用的某些解决方案会重新衍生整个应用程序，这种高成本方案无法复用原始进程。其他MPI解决方案虽能复用进程，但在收缩时无法完全释放非必要进程，因为同一通信域内的部分秩在节点间保持活动状态，导致应用无法将相应节点归还系统。本研究通过提出一种创新的并行衍生策略克服了上述双重局限，该策略中所有进程协同参与衍生过程。这使得扩展操作能够复用现有进程，同时终止非必要进程。该策略已在具有等核数与异核数节点的两类系统上得到验证。实验表明，该策略在保持竞争力的扩展时间的同时，在等核与异核节点场景下分别产生至多$1.13\times$和$1.25\times$的开销。更重要的是，该策略实现了快速收缩操作，在相同场景下将收缩成本降低至少$1387\times$和$20\times$。