Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning in real-world decision making applications. In particular, we require algorithms that can guarantee robust, safe performance in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In this work, we propose a safe reinforcement learning framework with robustness guarantees through the use of an optimal transport cost uncertainty set. We provide an efficient, theoretically supported implementation based on Optimal Transport Perturbations, which can be applied in a completely offline fashion using only data collected in a nominal training environment. We demonstrate the robust, safe performance of our approach on a variety of continuous control tasks with safety constraints in the Real-World Reinforcement Learning Suite.
翻译:鲁棒性和安全性对于深度强化学习在实际决策应用中的可信部署至关重要。具体而言,我们需要能够保证在存在一般环境扰动的情况下实现鲁棒且安全性能的算法,同时在对训练过程中的数据收集过程做有限假设的前提下开展工作。本文提出了一种通过最优输运代价不确定性集实现鲁棒性保障的安全强化学习框架。基于最优输运输动,我们提供了高效且具有理论支撑的实现方法,该方法可完全离线应用于仅使用标称训练环境收集的数据。我们在现实世界强化学习套件中具有安全约束的各类连续控制任务上,验证了所提方法的鲁棒且安全性能。