Data-intensive scientific workflows increasingly rely on high-performance computing (HPC) systems, complementing traditional Grid and Cloud platforms. However, workflow scheduling on HPC infrastructures remains challenging due to the prevalence of non-uniform memory access (NUMA) architectures. These systems require schedulers to account for data locality not only across distributed environments but also within each node. Modern HPC nodes integrate multiple NUMA domains and heterogeneous memory regions, such as high-bandwidth memory (HBM) and DRAM, and frequently attach accelerators (GPUs or FPGAs) and network interface cards (NICs) to specific NUMA nodes. This design increases the variability of data-access latency and complicates the placement of both tasks and data. Despite these constraints, most workflow scheduling strategies were originally developed for Grid or Cloud environments and rarely incorporate NUMA-aware considerations. To address this gap, this work introduces nFlows, a NUMA-aware Workflow Execution Runtime System that enables the modeling, bare-metal execution, simulation, and validation of scheduling algorithms for data-intensive workflows on NUMA-based HPC systems. The system's design, implementation, and validation methodology are presented. nFlows supports the construction of simulation models and their direct execution on physical systems, enabling studies of NUMA effects on scheduling, the design of NUMA-aware algorithms, the analysis of data-movement behavior, the identification of performance bottlenecks, and the exploration of in-memory workflow execution.
翻译:数据密集型科学工作流日益依赖高性能计算(HPC)系统,作为传统网格与云平台的重要补充。然而,由于非均匀内存访问(NUMA)架构的普遍存在,HPC基础设施上的工作流调度仍面临挑战。此类系统要求调度器不仅需考虑分布式环境中的数据局部性,还需关注每个节点内部的数据分布。现代HPC节点集成了多个NUMA域与异构内存区域(如高带宽内存HBM与DRAM),并常将加速器(GPU或FPGA)及网络接口卡(NIC)连接至特定NUMA节点。这种设计增加了数据访问延迟的差异性,并使任务与数据的协同布局复杂化。尽管存在这些约束,多数工作流调度策略最初为网格或云环境设计,极少纳入NUMA感知机制。为填补这一空白,本研究提出nFlows——一种NUMA感知的工作流执行运行时系统,支持在基于NUMA的HPC系统上对数据密集型工作流进行调度算法的建模、裸机执行、仿真与验证。本文阐述了该系统的设计、实现与验证方法。nFlows支持构建仿真模型并在物理系统上直接执行,从而能够研究NUMA效应对调度的影响、设计NUMA感知算法、分析数据移动行为、识别性能瓶颈,并探索内存内工作流执行模式。