Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

from arxiv, 9 pages, 8 figures including appendices (A, B, C). Accepted as a poster presentation in the demo track at the "XAI in Action: Past, Present, and Future Applications" workshop at NeurIPS 2023. MITRE Public Release Case Number 23-3095

Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Deep Reinforcement Learning (DRL) model and increase user trust and adoption in real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained DRL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN's effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model. The open-source code repository is available for download at https://github.com/mitre/arlin.

翻译：可解释强化学习（XRL）能够揭示深度强化学习（DRL）模型的决策过程，从而提升用户在实际场景中的信任度与应用采纳率。通过运用XRL技术，研究人员可在部署前识别训练完成的DRL模型中的潜在漏洞，进而降低系统任务失败或发生错误的可能性。本文介绍了ARLIN（可保障强化学习模型审查工具包）——一款开源Python库，其通过生成精细且可人工解读的可解释性输出，定位训练完成的DRL模型中的潜在脆弱点与关键节点。为展示ARLIN的有效性，我们针对公开可用的DRL模型提供了可解释性可视化结果与漏洞分析。开源代码仓库可通过https://github.com/mitre/arlin下载。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日