In this paper, we study the application of DRL algorithms in the context of local navigation problems, in which a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors, such as LiDAR. Collision avoidance policies based on DRL present some advantages, but they are quite susceptible to local minima, once their capacity to learn suitable actions is limited to the sensor range. Since most robots perform tasks in unstructured environments, it is of great interest to seek generalized local navigation policies capable of avoiding local minima, especially in untrained scenarios. To do so, we propose a novel reward function that incorporates map information gained in the training stage, increasing the agent's capacity to deliberate about the best course of action. Also, we use the SAC algorithm for training our ANN, which shows to be more effective than others in the state-of-the-art literature. A set of sim-to-sim and sim-to-real experiments illustrate that our proposed reward combined with the SAC outperforms the compared methods in terms of local minima and collision avoidance.
翻译:本文研究了深度强化学习算法在局部导航问题中的应用,其中机器人仅配备有限范围的外部传感器(如激光雷达),在未知且杂乱的工作空间内朝向目标位置移动。基于深度强化学习的避碰策略具有若干优势,但由于其学习合适动作的能力受限于传感器范围,极易陷入局部极小值。鉴于大多数机器人在非结构化环境中执行任务,寻求能够避免局部极小值(尤其在未经训练的场景中)的泛化局部导航策略具有重要意义。为此,我们提出了一种新型奖励函数,该函数整合了训练阶段获取的地图信息,提升了智能体权衡最佳行动路径的决策能力。同时,我们采用SAC算法训练人工神经网络,实验表明该算法相较于现有文献中的其他方法更具有效性。一系列仿真间与仿真-实物迁移实验证明,我们所提出的奖励函数与SAC算法相结合的方法在局部极小值规避和碰撞避免方面优于对比方法。