Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Deep Reinforcement Learning (DRL) model and increase user trust and adoption in real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained DRL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN's effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model. The open-source code repository is available for download at https://github.com/mitre/arlin.
翻译:可解释强化学习(XRL)能够揭示深度强化学习(DRL)模型的决策过程,从而提升用户在实际场景中的信任度与应用采纳率。通过运用XRL技术,研究人员可在部署前识别训练完成的DRL模型中的潜在漏洞,进而降低系统任务失败或发生错误的可能性。本文介绍了ARLIN(可保障强化学习模型审查工具包)——一款开源Python库,其通过生成精细且可人工解读的可解释性输出,定位训练完成的DRL模型中的潜在脆弱点与关键节点。为展示ARLIN的有效性,我们针对公开可用的DRL模型提供了可解释性可视化结果与漏洞分析。开源代码仓库可通过https://github.com/mitre/arlin下载。