In the field of high-performance computing (HPC), there has been recent exploration into the use of deep reinforcement learning for cluster scheduling (DRL scheduling), which has demonstrated promising outcomes. However, a significant challenge arises from the lack of interpretability in deep neural networks (DNN), rendering them as black-box models to system managers. This lack of model interpretability hinders the practical deployment of DRL scheduling. In this work, we present a framework called IRL (Interpretable Reinforcement Learning) to address the issue of interpretability of DRL scheduling. The core idea is to interpret DNN (i.e., the DRL policy) as a decision tree by utilizing imitation learning. Unlike DNN, decision tree models are non-parametric and easily comprehensible to humans. To extract an effective and efficient decision tree, IRL incorporates the Dataset Aggregation (DAgger) algorithm and introduces the notion of critical state to prune the derived decision tree. Through trace-based experiments, we demonstrate that IRL is capable of converting a black-box DNN policy into an interpretable rulebased decision tree while maintaining comparable scheduling performance. Additionally, IRL can contribute to the setting of rewards in DRL scheduling.
翻译:在高性能计算(HPC)领域,近年来已开始探索将深度强化学习用于集群调度(DRL调度),并取得了令人鼓舞的成果。然而,一个重大挑战在于深度神经网络(DNN)缺乏可解释性,使其对系统管理者而言成为黑箱模型。这种模型可解释性的缺失阻碍了DRL调度的实际部署。本文提出一个名为IRL(可解释强化学习)的框架,以解决DRL调度的可解释性问题。其核心思想是通过利用模仿学习,将DNN(即DRL策略)解释为决策树。与DNN不同,决策树模型是非参数化的,易于人类理解。为提取有效且高效的决策树,IRL融入了数据集聚合(DAgger)算法,并引入关键状态的概念来修剪生成的决策树。通过基于轨迹的实验,我们证明IRL能够将黑箱DNN策略转化为可解释的基于规则的决策树,同时保持相近的调度性能。此外,IRL还可为DRL调度中的奖励设置提供助力。