Deep Reinforcement learning (DRL) is used to enable autonomous navigation in unknown environments. Most research assume perfect sensor data, but real-world environments may contain natural and artificial sensor noise and denial. Here, we present a benchmark of both well-used and emerging DRL algorithms in a navigation task with configurable sensor denial effects. In particular, we are interested in comparing how different DRL methods (e.g. model-free PPO vs. model-based DreamerV3) are affected by sensor denial. We show that DreamerV3 outperforms other methods in the visual end-to-end navigation task with a dynamic goal - and other methods are not able to learn this. Furthermore, DreamerV3 generally outperforms other methods in sensor-denied environments. In order to improve robustness, we use adversarial training and demonstrate an improved performance in denied environments, although this generally comes with a performance cost on the vanilla environments. We anticipate this benchmark of different DRL methods and the usage of adversarial training to be a starting point for the development of more elaborate navigation strategies that are capable of dealing with uncertain and denied sensor readings.
翻译:深度强化学习(DRL)被用于实现未知环境中的自主导航。现有研究大多假设传感器数据完美无缺,但真实环境往往存在自然或人为的传感器噪声与信号失效问题。本文提出了一套可配置传感器失效效应的导航任务基准测试体系,用于评估常用及新兴的DRL算法。我们重点比较了不同DRL方法(例如无模型的PPO与基于模型的DreamerV3)在传感器失效条件下的性能差异。实验表明,在动态目标视觉端到端导航任务中,DreamerV3显著优于其他方法——而其他算法均无法有效学习该任务。此外,在传感器受限环境中,DreamerV3普遍展现出更优的性能。为提升系统鲁棒性,我们采用对抗训练方法,在传感器失效环境中实现了性能提升,但这也导致在标准环境中的性能普遍下降。我们预期,本次针对不同DRL方法的基准测试以及对抗训练的应用,将为开发能够处理传感器读数不确定与失效问题的复杂导航策略提供研究起点。