Model-free continuous control for robot navigation tasks using Deep Reinforcement Learning (DRL) that relies on noisy policies for exploration is sensitive to the density of rewards. In practice, robots are usually deployed in cluttered environments, containing many obstacles and narrow passageways. Designing dense effective rewards is challenging, resulting in exploration issues during training. Such a problem becomes even more serious when tasks are described using temporal logic specifications. This work presents a deep policy gradient algorithm for controlling a robot with unknown dynamics operating in a cluttered environment when the task is specified as a Linear Temporal Logic (LTL) formula. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme by integrating sampling-based methods to effectively complete goal-reaching missions. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-goal-reaching tasks that are solved in a distributed manner. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale cluttered environments. A video demonstration can be found on YouTube Channel: https://youtu.be/yMh_NUNWxho.
翻译:利用深度强化学习进行机器人导航任务的模型无关连续控制依赖于带噪声策略进行探索,其对奖励密度敏感。实际应用中,机器人通常部署在包含大量障碍物和狭窄通道的杂乱环境中。设计有效稠密奖励具有挑战性,导致训练过程中出现探索问题。当任务通过时间逻辑规范描述时,这一问题更为严峻。本文提出一种深度策略梯度算法,用于控制未知动力系统的机器人在杂乱环境中执行由线性时序逻辑公式指定的任务。为克服训练过程中环境探索的挑战,我们提出一种新型路径规划引导奖励机制,通过集成基于采样的方法有效完成目标到达任务。为促进线性时序逻辑满足,我们的方法将线性时序逻辑任务分解为以分布式方式求解的子目标到达任务。实验表明,该框架显著提升了大规模杂乱环境中执行复杂任务机器人的性能(有效性、效率)与探索能力。视频演示可于YouTube频道观看:https://youtu.be/yMh_NUNWxho。