Dynamic obstacle avoidance (DOA) is a fundamental challenge for any autonomous vehicle, independent of whether it operates in sea, air, or land. This paper proposes a two-step architecture for handling DOA tasks by combining supervised and reinforcement learning (RL). In the first step, we introduce a data-driven approach to estimate the collision risk of an obstacle using a recurrent neural network, which is trained in a supervised fashion and offers robustness to non-linear obstacle movements. In the second step, we include these collision risk estimates into the observation space of an RL agent to increase its situational awareness.~We illustrate the power of our two-step approach by training different RL agents in a challenging environment that requires to navigate amid multiple obstacles. The non-linear movements of obstacles are exemplarily modeled based on stochastic processes and periodic patterns, although our architecture is suitable for any obstacle dynamics. The experiments reveal that integrating our collision risk metrics into the observation space doubles the performance in terms of reward, which is equivalent to halving the number of collisions in the considered environment. Furthermore, we show that the architecture's performance improvement is independent of the applied RL algorithm.
翻译:动态避障(DOA)是任何自主车辆面临的基础性挑战,无论其在海洋、空中还是陆地环境中运行。本文提出一种将监督学习与强化学习(RL)相结合的两步式架构以处理DOA任务。第一步,我们引入数据驱动方法,通过循环神经网络(RNN)估算障碍物的碰撞风险,该网络采用监督方式训练,对非线性障碍物运动具有鲁棒性。第二步,我们将这些碰撞风险估计值纳入RL智能体的观测空间,以增强其态势感知能力。通过在需要穿越多个障碍物的挑战性环境中训练不同RL智能体,我们展示了此两步式架构的强大效能。尽管我们的架构适用于任何障碍物动态,但障碍物的非线性运动已基于随机过程和周期模式进行了示范性建模。实验表明,将碰撞风险指标整合至观测空间后,奖励性能提升一倍,等同于在所述环境中将碰撞次数减半。此外,我们证明架构的性能提升与所应用的RL算法无关。