Interpretable Reinforcement Learning for Robotics and Continuous Control

Rohan Paleja,Letian Chen,Yaru Niu,Andrew Silva,Zhaoxin Li,Songan Zhang,Chace Ritchie,Sugju Choi,Kimberlee Chestnut Chang,Hongtei Eric Tseng,Yan Wang,Subramanya Nageshrao,Matthew Gombolay

from arxiv, arXiv admin note: text overlap with arXiv:2202.02352

Interpretability in machine learning is critical for the safe deployment of learned policies across legally-regulated and safety-critical domains. While gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for continuous control problems such as robotics and autonomous driving, the lack of interpretability is a fundamental barrier to adoption. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, reinforcement learning approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning policies that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of parameters against deep learning baselines. We prove that ICCTs can serve as universal function approximators and display analytically that ICCTs can be verified in linear time. Furthermore, we deploy ICCTs in two realistic driving domains, based on interstate Highway-94 and 280 in the US. Finally, we verify ICCT's utility with end-users and find that ICCTs are rated easier to simulate, quicker to validate, and more interpretable than neural networks.

翻译：机器学习中的可解释性对于在法规监管和安全关键领域安全部署学习策略至关重要。尽管强化学习中的梯度方法在机器人、自动驾驶等连续控制问题的策略学习中取得了巨大成功，但缺乏可解释性仍是其应用的根本障碍。我们提出可解释连续控制树（Interpretable Continuous Control Trees, ICCTs），这是一种基于树结构的模型，可通过现代梯度强化学习方法进行优化，从而生成高性能且可解释的策略。该方法的关键在于提出了一种能够在稀疏决策树类表征中进行直接优化的流程。我们在六个领域将ICCT与基线模型进行对比验证，结果表明ICCT能够学习与基线相当或在其基础上提升33%效果的策略（尤其在自动驾驶场景中），同时相较于深度学习基线模型实现了300-600倍的参数缩减。我们证明ICCT可作为通用函数逼近器，并解析证明其可在线性时间内完成验证。此外，我们将ICCT部署于基于美国94号州际公路和280号州际公路的两个真实驾驶场景。最后，通过终端用户验证ICCT的实用性，发现相较于神经网络，ICCT被认为更易模拟、验证速度更快且可解释性更强。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日