Deep Reinforcement Learning (DRL) is a frequently employed technique to solve scheduling problems. Although DRL agents ace at delivering viable results in short computing times, their reasoning remains opaque. We conduct a case study where we systematically apply two explainable AI (xAI) frameworks, namely SHAP (DeepSHAP) and Captum (Input x Gradient), to describe the reasoning behind scheduling decisions of a specialized DRL agent in a flow production. We find that methods in the xAI literature lack falsifiability and consistent terminology, do not adequately consider domain-knowledge, the target audience or real-world scenarios, and typically provide simple input-output explanations rather than causal interpretations. To resolve this issue, we introduce a hypotheses-based workflow. This approach enables us to inspect whether explanations align with domain knowledge and match the reward hypotheses of the agent. We furthermore tackle the challenge of communicating these insights to third parties by tailoring hypotheses to the target audience, which can serve as interpretations of the agent's behavior after verification. Our proposed workflow emphasizes the repeated verification of explanations and may be applicable to various DRL-based scheduling use cases.
翻译:深度强化学习(DRL)是解决调度问题的常用技术。尽管DRL智能体能在较短计算时间内提供可行结果,但其决策逻辑仍不透明。本研究通过案例系统应用两种可解释人工智能(xAI)框架——SHAP(DeepSHAP)与Captum(Input x Gradient),以解析流水线生产中专用DRL智能体调度决策的内在逻辑。研究发现,现有xAI方法存在可证伪性缺失、术语不一致、对领域知识/目标受众/实际场景考量不足等问题,且多提供简单的输入输出解释而非因果性阐释。为此,我们提出基于假设验证的工作流程。该方法能检验解释是否与领域知识相符,并验证智能体奖励假设的合理性。同时,我们通过针对目标受众定制可验证假设来解决向第三方传达解释的挑战,这些假设经验证后可成为智能体行为的有效解读。本工作流程强调对解释的反复验证,可适用于各类基于DRL的调度应用场景。