While load balancing in distributed-memory computing has been well-studied, we present an innovative approach to this problem: a unified, reduced-order model that combines three key components to describe "work" in a distributed system: computation, communication, and memory. Our model enables an optimizer to explore complex tradeoffs in task placement, such as increased parallelism at the expense of data replication, which increases memory usage. We propose a fully distributed, heuristic-based load balancing optimization algorithm, and demonstrate that it quickly finds close-to-optimal solutions. We formalize the complex optimization problem as a mixed-integer linear program, and compare it to our strategy. Finally, we show that when applied to an electromagnetics code, our approach obtains up to 2.3x speedups for the imbalanced execution.
翻译:尽管分布式内存计算中的负载均衡问题已被广泛研究,我们提出了一种创新的解决方案:一个统一且降阶的模型,该模型将描述分布式系统中"工作负载"的三个关键要素——计算、通信和内存——整合为一体。我们的模型使优化器能够探索任务放置中的复杂权衡,例如以增加数据复制为代价提升并行度,而这会导致内存使用量增加。我们提出了一种完全分布式的、基于启发式的负载均衡优化算法,并证明其能快速找到接近最优的解决方案。我们将这一复杂的优化问题形式化为混合整数线性规划,并与我们的策略进行比较。最后,我们展示了该算法在电磁学代码上的应用效果:在非均衡执行中获得了高达2.3倍的加速比。