Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments from Temporal Logic Specifications

Model-free continuous control for robot navigation tasks using Deep Reinforcement Learning (DRL) that relies on noisy policies for exploration is sensitive to the density of rewards. In practice, robots are usually deployed in cluttered environments, containing many obstacles and narrow passageways. Designing dense effective rewards is challenging, resulting in exploration issues during training. Such a problem becomes even more serious when tasks are described using temporal logic specifications. This work presents a deep policy gradient algorithm for controlling a robot with unknown dynamics operating in a cluttered environment when the task is specified as a Linear Temporal Logic (LTL) formula. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme by integrating sampling-based methods to effectively complete goal-reaching missions. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-goal-reaching tasks that are solved in a distributed manner. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale cluttered environments. A video demonstration can be found on YouTube Channel: https://youtu.be/yMh_NUNWxho.

翻译：利用深度强化学习进行机器人导航任务的模型无关连续控制依赖于带噪声策略进行探索，其对奖励密度敏感。实际应用中，机器人通常部署在包含大量障碍物和狭窄通道的杂乱环境中。设计有效稠密奖励具有挑战性，导致训练过程中出现探索问题。当任务通过时间逻辑规范描述时，这一问题更为严峻。本文提出一种深度策略梯度算法，用于控制未知动力系统的机器人在杂乱环境中执行由线性时序逻辑公式指定的任务。为克服训练过程中环境探索的挑战，我们提出一种新型路径规划引导奖励机制，通过集成基于采样的方法有效完成目标到达任务。为促进线性时序逻辑满足，我们的方法将线性时序逻辑任务分解为以分布式方式求解的子目标到达任务。实验表明，该框架显著提升了大规模杂乱环境中执行复杂任务机器人的性能（有效性、效率）与探索能力。视频演示可于YouTube频道观看：https://youtu.be/yMh_NUNWxho。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

专知会员服务

79+阅读 · 2022年12月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日