Deep Reinforcement Learning is quickly becoming a popular method for training autonomous Unmanned Aerial Vehicles (UAVs). Our work analyzes the effects of measurement uncertainty on the performance of Deep Reinforcement Learning (DRL) based waypoint navigation and obstacle avoidance for UAVs. Measurement uncertainty originates from noise in the sensors used for localization and detecting obstacles. Measurement uncertainty/noise is considered to follow a Gaussian probability distribution with unknown non-zero mean and variance. We evaluate the performance of a DRL agent trained using the Proximal Policy Optimization (PPO) algorithm in an environment with continuous state and action spaces. The environment is randomized with different numbers of obstacles for each simulation episode in the presence of varying degrees of noise, to capture the effects of realistic sensor measurements. Denoising techniques like the low pass filter and Kalman filter improve performance in the presence of unbiased noise. Moreover, we show that artificially injecting noise into the measurements during evaluation actually improves performance in certain scenarios. Extensive training and testing of the DRL agent under various UAV navigation scenarios are performed in the PyBullet physics simulator. To evaluate the practical validity of our method, we port the policy trained in simulation onto a real UAV without any further modifications and verify the results in a real-world environment.
翻译:深度强化学习正迅速成为训练自主无人机(UAV)的流行方法。本文分析了测量不确定性对基于深度强化学习的无人机航点导航与避障性能的影响。测量不确定性源于用于定位和障碍物检测的传感器噪声。我们假设测量不确定性/噪声服从均值和方差未知的高斯概率分布。我们采用近端策略优化(PPO)算法训练深度强化学习智能体,并在连续状态与动作空间的环境中进行性能评估。通过为每次仿真回合设置不同数量的障碍物及不同程度的噪声,模拟真实传感器测量效应。实验表明,低通滤波器和卡尔曼滤波等去噪技术能够提升存在无偏噪声时的系统性能。此外,我们发现评估过程中人为向测量数据注入噪声反而能在特定场景下提升性能。我们在PyBullet物理仿真器中针对多种无人机导航场景对深度强化学习智能体进行了大规模训练与测试。为验证方法的实际有效性,我们将仿真训练得到的策略直接移植至真实无人机,未做任何修改即在真实环境中完成验证。