Scientific workflows are designed as directed acyclic graphs (DAGs) and consist of multiple dependent task definitions. They are executed over a large amount of data, often resulting in thousands of tasks with heterogeneous compute requirements and long runtimes, even on cluster infrastructures. In order to optimize the workflow performance, enough resources, e.g., CPU and memory, need to be provisioned for the respective tasks. Typically, workflow systems rely on user resource estimates which are known to be highly error-prone and can result in over- or underprovisioning. While resource overprovisioning leads to high resource wastage, underprovisioning can result in long runtimes or even failed tasks. In this paper, we propose two different reinforcement learning approaches based on gradient bandits and Q-learning, respectively, in order to minimize resource wastage by selecting suitable CPU and memory allocations. We provide a prototypical implementation in the well-known scientific workflow management system Nextflow, evaluate our approaches with five workflows, and compare them against the default resource configurations and a state-of-the-art feedback loop baseline. The evaluation yields that our reinforcement learning approaches significantly reduce resource wastage compared to the default configuration. Further, our approaches also reduce the allocated CPU hours compared to the state-of-the-art feedback loop by 6.79% and 24.53%.
翻译:科学工作流被设计为有向无环图(DAG),由多个相互依赖的任务定义组成。这些工作流在大量数据上执行,通常会产生数千个异构计算需求的任务,即使在集群基础设施上也可能导致长时间运行。为了优化工作流性能,需要为相应任务分配足够的资源(例如CPU和内存)。通常,工作流系统依赖用户提供的资源估计,但这些估计已知存在高度误差,可能导致资源过度分配或分配不足。过度分配会导致资源严重浪费,而分配不足则可能造成运行时间延长甚至任务失败。本文提出了两种基于梯度赌博机和Q学习的强化学习方法,旨在通过选择合适的CPU和内存分配来最小化资源浪费。我们在知名科学工作流管理系统Nextflow中实现了原型,使用五个工作流评估了所提出的方法,并将其与默认资源配置及最新的反馈循环基线进行了比较。评估结果表明,与默认配置相比,我们的强化学习方法显著减少了资源浪费。此外,与最新的反馈循环相比,我们的方法还将分配的CPU小时数分别降低了6.79%和24.53%。