Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community. In this work, we present DMRlib, a library designed to offer the global advantages of process malleability while providing a minimalist MPI-like syntax. The library includes a series of predefined communication patterns that greatly ease the development of malleable applications. In addition, we deploy several scenarios to demonstrate the positive impact of process malleability featuring different scalability patterns. Concretely, we study two job submission modes (rigid and moldable) in order to identify the best-case scenarios for malleability using metrics such as resource allocation rate, completed jobs per second, and energy consumption. The experiments prove that our elastic approach may improve global throughput by a factor higher than 3x compared to the traditional workloads of non-malleable jobs.
翻译:进程可塑性已被证明,与传统的静态资源分配策略相比,能够对数据中心的资源利用率和全局生产力产生显著的积极影响。然而,该方案所引入的不可忽略的额外开发工作量,限制了其在科学编程社区中的采用。在本工作中,我们提出了 DMRlib,一个旨在提供进程可塑性的全局优势,同时具有极简 MPI 风格语法的库。该库包含一系列预定义的通信模式,极大地简化了可塑性应用的开发。此外,我们部署了若干场景,以展示具有不同可扩展性模式的进程可塑性的积极影响。具体而言,我们研究了两种作业提交模式(刚性作业和可塑作业),以使用资源分配率、每秒完成的作业数和能量消耗等指标来识别可塑性的最佳应用场景。实验证明,与传统的非可塑作业负载相比,我们的弹性方法可将全局吞吐量提升超过 3 倍。