Many real-world domains require safe decision making in uncertain environments. In this work, we introduce a deep reinforcement learning framework for approaching this important problem. We consider a distribution over transition models, and apply a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures. We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems. Unlike existing approaches to robustness in deep reinforcement learning, however, our formulation does not involve minimax optimization. This leads to an efficient, model-free implementation of our approach that only requires standard data collection from a single training environment. In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
翻译:许多现实领域要求在不确定环境中进行安全决策。本文提出一种深度强化学习框架来处理这一重要问题。我们考虑迁移模型上的分布,并通过使用一致扭曲风险度量,从风险规避视角应对模型不确定性。通过证明该框架等价于特定类别的分布鲁棒安全强化学习问题,我们为其提供了鲁棒性保证。然而,与现有深度强化学习中的鲁棒性方法不同,我们的表述不涉及极小极大优化。这使我们能够实现高效、无模型的算法实现,仅需从单一训练环境中进行标准数据收集。在具有安全约束的连续控制任务实验中,我们证明该框架在部署时能在多种扰动测试环境中产生鲁棒的性能与安全性。