Supercomputers have revolutionized how industries and scientific fields process large amounts of data. These machines group hundreds or thousands of computing nodes working together to execute time-consuming programs that require a large amount of computational resources. Over the years, supercomputers have expanded to include new and different technologies characterizing them as heterogeneous. However, executing a program in a heterogeneous environment requires attention to a specific aspect of performance degradation: load imbalance. In this research, we address the challenges associated with load imbalance when scheduling many homogeneous tasks in a heterogeneous environment. To address this issue, we introduce the concept of adaptive asynchronous work-stealing. This approach collects information about the nodes and utilizes it to improve work-stealing aspects, such as victim selection and task offloading. Additionally, the proposed approach eliminates the need for extra threads to communicate information, thereby reducing overhead when implementing a fully asynchronous approach. Our experimental results demonstrate a performance improvement of approximately 10.1\% compared to other conventional and state-of-the-art implementations.
翻译:超级计算机彻底改变了工业和科学领域处理海量数据的方式。这些机器将数百或数千个计算节点集群化运作,共同执行需要大量计算资源的耗时程序。多年来,超级计算机不断扩展集成新型异构技术,使其呈现出异构化特征。然而,在异构环境中执行程序时需特别关注性能退化的重要方面:负载不均衡。本研究针对异构环境下调度大量同质任务时出现的负载不均衡问题展开攻关。为解决该问题,我们提出自适应异步工作窃取概念。该方法通过采集节点信息,优化工作窃取机制中的受害者选择与任务卸载等环节。此外,所提出的方法消除了额外线程传递信息的需要,从而在实现完全异步方案时降低系统开销。实验结果表明,与常规及现有最优实现方案相比,本方法性能提升约10.1%。