Lack of explainability is a key factor limiting the practical adoption of high-performant Deep Reinforcement Learning (DRL) controllers. Explainable RL for networking hitherto used salient input features to interpret a controller's behavior. However, these feature-based solutions do not completely explain the controller's decision-making process. Often, operators are interested in understanding the impact of a controller's actions on performance in the future, which feature-based solutions cannot capture. In this paper, we present CrystalBox, a framework that explains a controller's behavior in terms of the future impact on key network performance metrics. CrystalBox employs a novel learning-based approach to generate succinct and expressive explanations. We use reward components of the DRL network controller, which are key performance metrics meaningful to operators, as the basis for explanations. CrystalBox is generalizable and can work across both discrete and continuous control environments without any changes to the controller or the DRL workflow. Using adaptive bitrate streaming and congestion control, we demonstrate CrytalBox's ability to generate high-fidelity future-based explanations. We additionally present three practical use cases of CrystalBox: cross-state explainability, guided reward design, and network observability.
翻译:缺乏可解释性是限制高性能深度强化学习(DRL)网络控制器实际应用的关键因素。目前面向网络的可解释强化学习主要依赖显著输入特征来解释控制器行为,然而这类基于特征的方法无法完整解释控制器的决策过程。运营商通常需要理解控制器当前行为对未来性能的影响,这正是基于特征的方法所无法捕捉的。本文提出CrystalBox框架,该框架通过关键网络性能指标的远期影响来解释控制器行为。CrystalBox采用创新的学习方法生成简洁且富有表达力的解释,以DRL网络控制器的奖励组件(即运营商关注的关键性能指标)作为解释基础。该框架具备通用性,无需修改控制器或DRL工作流程即可适用于离散与连续控制环境。通过自适应比特率流和拥塞控制实验,我们验证了CrystalBox生成高保真未来影响解释的能力。此外,本文还展示了CrystalBox的三个实际应用场景:跨状态可解释性、引导式奖励设计及网络可观测性。