This work presents a study on parallel and distributional deep reinforcement learning applied to the mapless navigation of UAVs. For this, we developed an approach based on the Soft Actor-Critic method, producing a distributed and distributional variant named PDSAC, and compared it with a second one based on the traditional SAC algorithm. In addition, we also embodied a prioritized memory system into them. The UAV used in the study is based on the Hydrone vehicle, a hybrid quadrotor operating solely in the air. The inputs for the system are 23 range findings from a Lidar sensor and the distance and angles towards a desired goal, while the outputs consist of the linear, angular, and, altitude velocities. The methods were trained in environments of varying complexity, from obstacle-free environments to environments with multiple obstacles in three dimensions. The results obtained, demonstrate a concise improvement in the navigation capabilities by the proposed approach when compared to the agent based on the SAC for the same amount of training steps. In summary, this work presented a study on deep reinforcement learning applied to mapless navigation of drones in three dimensions, with promising results and potential applications in various contexts related to robotics and autonomous air navigation with distributed and distributional variants.
翻译:本文研究了并行与分布式深度强化学习在无人机无地图导航中的应用。为此,我们基于软演员-评论家(Soft Actor-Critic)方法提出了一种名为PDSAC的分布式与分布性变体,并将其与基于传统SAC算法的第二种方法进行了对比。此外,我们还为这些方法嵌入了优先记忆系统。研究中使用的无人机基于Hydrone飞行器(一种仅在空中运行的混合四旋翼飞行器)。系统输入为来自激光雷达传感器的23个测距值,以及指向目标点的距离和角度;输出包括线速度、角速度和高度速度。该方法在复杂度不同的环境中进行了训练,范围从无障碍环境到存在多个三维障碍物的环境。结果表明,在相同训练步数下,所提出的方法相比基于SAC的智能体在导航能力上有显著提升。总之,本文研究了深度强化学习在三维空间中无人机无地图导航中的应用,其成果具有前景,并可能应用于与机器人技术和自主空中导航相关的多种场景(采用分布式与分布性变体)。