In emerging scientific computing environments, matrix computations of increasing size and complexity are increasingly becoming prevalent. However, contemporary matrix language implementations are insufficient in their support for efficient utilization of cloud computing resources, particularly on the user side. We thus developed an extension of the Julia high-performance computation language such that matrix computations are automatically parallelized in the cloud, where users are separated from directly interacting with complex explicitly-parallel computations. We implement lazy evaluation semantics combined with directed graphs to optimize matrix operations on the fly while dynamic simulation finds the optimal tile size and schedule for a given cluster of cloud nodes. A time model prediction of the cluster's performance capacity is constructed to enable simulations. Automatic configuration of communication and worker processes on the cloud networks allow for the framework to automatically scale up for clusters of heterogeneous nodes. Our framework's experimental evaluation comprises eleven benchmarks on an fourteen node (564 CPUs) cluster in the AWS public cloud, revealing speedups of up to a factor of 5.1, with an average 74.39% of the upper bound for speedups.
翻译:在新兴科学计算环境中,规模和复杂度日益增长的矩阵计算正愈发普遍。然而,当前矩阵语言实现在高效利用云计算资源方面支持不足,尤其是在用户端。因此,我们开发了Julia高性能计算语言的扩展,使矩阵计算在云中自动并行化,用户无需直接与复杂的显式并行计算交互。我们实现了惰性求值语义并结合有向图以即时优化矩阵操作,同时通过动态模拟为给定云节点集群寻找最优瓦片大小和调度方案。通过构建集群性能容量的时间模型预测来支持模拟。云网络上通信与工作进程的自动配置使得框架能够自动扩展以适应异构节点集群。我们的框架实验评估在AWS公共云的一个14节点(564 CPU)集群上运行了11个基准测试,结果显示加速比最高达5.1倍,平均达到加速比上限的74.39%。