Asynchronous Many-Task (AMT) systems offer a potential solution for efficiently programming complicated scientific applications on extreme-scale heterogeneous architectures. However, they exhibit different communication needs from traditional bulk-synchronous parallel (BSP) applications, posing new challenges for underlying communication libraries. This work systematically studies the communication needs of AMTs and explores how communication libraries can be structured to better satisfy them through a case study of a real-world AMT system, HPX. We first examine its communication stack layout and formalize the communication abstraction that underlying communication libraries need to support. We then analyze its current MPI backend (parcelport) and identify four categories of needs that are not typical in the BSP model and are not well covered by the MPI standard. To bridge these gaps, we design from the native network layer and incorporate various techniques, including one-sided communication, queue-based completion notification, explicit progressing, and different ways of resource contention mitigation, in a new parcelport with an experimental communication library, LCI. Overall, the resulting LCI parcelport outperforms the existing MPI parcelport with up to 50x in microbenchmarks and 2x in a real-world application. Using it as a testbed, we design LCI parcelport variants to quantify the performance contributions of each technique. This work combines conceptual analysis and experiment results to offer a practical guideline for the future development of communication libraries and AMT communication layers.
翻译:异步多任务系统为在极端规模异构架构上高效编程复杂科学应用提供了潜在解决方案。然而,其通信需求与传统体同步并行应用存在差异,这为底层通信库带来了新的挑战。本研究通过真实异步多任务系统HPX的案例研究,系统性地探讨了异步多任务系统的通信需求,并探索了如何构建通信库以更好地满足这些需求。我们首先考察了其通信栈布局,并形式化了底层通信库需要支持的通信抽象。随后分析了其当前基于MPI的后端实现,识别出四类在体同步并行模型中不典型且未被MPI标准充分覆盖的需求。为弥合这些差距,我们从原生网络层重新设计,在基于实验性通信库LCI的新型实现中整合了多种技术,包括单边通信、基于队列的完成通知、显式进展机制以及多种资源竞争缓解方法。总体而言,新实现的LCI后端在微基准测试中性能最高可达现有MPI后端的50倍,在实际应用中达到2倍加速。以其作为测试平台,我们设计了多种LCI后端变体以量化各项技术的性能贡献。本研究结合概念分析与实验结果,为未来通信库与异步多任务通信层的开发提供了实用指导。