Supercomputers have revolutionized how industries and scientific fields process large amounts of data. These machines group hundreds or thousands of computing nodes working together to execute time-consuming programs that require a large amount of computational resources. Over the years, supercomputers have expanded to include new and different technologies characterizing them as heterogeneous. However, executing a program in a heterogeneous environment requires attention to a specific aspect of performance degradation: load imbalance. In this research, we address the challenges associated with load imbalance when scheduling many homogeneous tasks in a heterogeneous environment. To address this issue, we introduce the concept of adaptive asynchronous work-stealing. This approach collects information about the nodes and utilizes it to improve work-stealing aspects, such as victim selection and task offloading. Additionally, the proposed approach eliminates the need for extra threads to communicate information, thereby reducing overhead when implementing a fully asynchronous approach. Our experimental results demonstrate a performance improvement of approximately 10.1\% compared to other conventional and state-of-the-art implementations.
翻译:超级计算机彻底改变了工业界和科学领域处理海量数据的方式。这些机器将成百上千个计算节点集群协同工作,以执行需要大量计算资源的耗时程序。近年来,超级计算机不断集成新型异构技术,呈现出异构化特征。然而,在异构环境下执行程序时,需要特别关注一个导致性能降级的关键问题:负载不均衡。本研究聚焦于在异构环境中调度大量同构任务时由负载不均衡带来的挑战。为解决该问题,我们提出自适应异步工作窃取概念。该方法通过收集节点信息,优化工作窃取策略中的受害者选择与任务卸载等环节。同时,所提方法无需额外线程传递通信信息,从而在实现完全异步方案时降低开销。实验结果表明,相较于其他传统及前沿实现方案,本方法可带来约10.1%的性能提升。