Probabilistic Guarantees for Safe Reinforcement Learning in Continuous Action Spaces via Temporal Logic

Vanilla Reinforcement Learning (RL) can efficiently solve complex tasks but does not provide any guarantees on system behavior. Yet, for real-world systems, which are often safety-critical, such guarantees on safety specifications are necessary. To bridge this gap, we propose a safe RL procedure for continuous action spaces with verified probabilistic guarantees specified via temporal logic. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification while randomizing the controller's inputs within an expansion set. Then, we use RL to improve the performance of this probabilistically verified controller and explore in the given expansion set around the controller's input. Finally, we calculate probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficiently implementable for continuous action and state spaces and separates safety verification and performance improvement into two distinct steps. We evaluate our approach on an evasion task where a robot has to reach a goal while evading a dynamic obstacle with a specific maneuver. Our results show that our safe RL approach leads to efficient learning while probablistically maintaining safety specifications.

翻译：原始强化学习（Vanilla RL）虽然能高效解决复杂任务，但无法提供任何关于系统行为的保证。然而，对于通常具有安全关键性的真实世界系统而言，这种针对安全规范（safety specifications）的保证是必要的。为弥合这一差距，我们提出了一种适用于连续动作空间的安全强化学习方法，该方法通过时序逻辑（temporal logic）提供经验证的概率性保证。首先，我们的方法在随机化控制器输入于扩张集（expansion set）的同时，针对时序逻辑规范对候选控制器进行概率验证。其次，我们利用强化学习来提升该经概率验证控制器的性能，并在控制器输入的给定扩张集内进行探索。最后，我们为学习到的智能体计算关于时序逻辑规范的概率安全保证。该方法可高效实现于连续动作与状态空间，并将安全验证与性能提升分为两个独立步骤。我们在一个规避任务上评估了该方法：机器人需通过特定机动动作规避动态障碍物并抵达目标。结果表明，我们的安全强化学习方法能够在概率性维持安全规范的同时实现高效学习。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日