DAPHNE is a new open-source software infrastructure designed to address the increasing demands of integrated data analysis (IDA) pipelines, comprising data management (DM), high performance computing (HPC), and machine learning (ML) systems. Efficiently executing IDA pipelines is challenging due to their diverse computing characteristics and demands. Therefore, IDA pipelines executed with the DAPHNE infrastructure require an efficient and versatile scheduler to support these demands. This work introduces DaphneSched, the task-based scheduler at the core of DAPHNE. DaphneSched is versatile by incorporating eleven task partitioning and three task assignment techniques, bringing the state-of-the-art closer to the state-of-the-practice task scheduling. To showcase DaphneSched's effectiveness in scheduling IDA pipelines, we evaluate its performance on two applications: a product recommendation system and a linear regression model training. We conduct performance experiments on multicore platforms with 20 and 56 cores. The results show that the versatility of DaphneSched enabled combinations of scheduling strategies that outperform commonly used scheduling techniques by up to 13%. This work confirms the benefits of employing DaphneSched for the efficient execution of applications with IDA pipelines.
翻译:DAPHNE是一个新型开源软件基础设施,旨在应对日益增长的集成数据分析管线(集成数据管理、高性能计算与机器学习系统)需求。由于此类管线具有多样化的计算特性与需求,高效执行IDA管线面临重大挑战。因此,基于DAPHNE基础设施运行的IDA管线需要高效且多功能的调度器来支撑这些需求。本文介绍DaphneSched——DAPHNE核心的基于任务的调度器。DaphneSched通过整合十一种任务划分技术与三种任务分配技术展现其多功能性,将学术前沿水平推进至更接近工程实践。为验证DaphneSched调度IDA管线的有效性,我们在产品推荐系统与线性回归模型训练两个应用场景中评估其性能,并在20核与56核多核平台上开展性能实验。结果表明,DaphneSched的多功能性使得其调度策略组合相比常用调度技术性能提升最高达13%。本工作证实了采用DaphneSched高效执行IDA管线应用的优势。