Many real-world domains require safe decision making in the presence of uncertainty. In this work, we propose a deep reinforcement learning framework for approaching this important problem. We consider a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures, and we show that our formulation is equivalent to a distributionally robust safe reinforcement learning problem with robustness guarantees on performance and safety. We propose an efficient implementation that only requires access to a single training environment, and we demonstrate that our framework produces robust, safe performance on a variety of continuous control tasks with safety constraints in the Real-World Reinforcement Learning Suite.
翻译:许多现实领域需要在不确定性存在的情况下进行安全决策。本文针对这一重要问题,提出了一种深度强化学习框架。我们通过采用一致失真风险度量,从风险规避视角应对模型不确定性,并证明该公式等价于一个在性能与安全性上具有鲁棒性保证的分布鲁棒安全强化学习问题。我们提出了一种仅需访问单个训练环境的高效实现方法,并在真实世界强化学习套件中具有安全约束的多种连续控制任务上,验证了该框架能产生鲁棒且安全的表现。