An exciting and promising frontier for Deep Reinforcement Learning (DRL) is its application to real-world robotic systems. While modern DRL approaches achieved remarkable successes in many robotic scenarios (including mobile robotics, surgical assistance, and autonomous driving) unpredictable and non-stationary environments can pose critical challenges to such methods. These features can significantly undermine fundamental requirements for a successful training process, such as the Markovian properties of the transition model. To address this challenge, we propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and DRL. In more detail, we show that our benchmarking environment is problematic even for state-of-the-art DRL approaches that may struggle to generate reliable policies in terms of generalization power and safety. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques (such as curriculum learning and learnable hyperparameters). Our extensive empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results. Our simulation environment and training baselines are freely available to facilitate further research on this open problem and encourage collaboration in the field.
翻译:深度强化学习(DRL)一个令人兴奋且前景广阔的前沿领域是其在现实世界机器人系统中的应用。尽管现代DRL方法在许多机器人场景(包括移动机器人、手术辅助和自动驾驶)中取得了显著成功,但不可预测和非平稳的环境可能给此类方法带来严峻挑战。这些特性可能严重破坏成功训练过程的基本要求,例如转移模型的马尔可夫性质。为应对这一挑战,我们利用游戏引擎与DRL集成的最新进展,提出了一个用于水生导航的新型基准测试环境。具体而言,我们证明了即使对于最先进的DRL方法,我们的基准环境也构成难题——这些方法可能在泛化能力和安全性方面难以生成可靠策略。我们特别聚焦于PPO这一被广泛接受的算法,并提出了进阶训练技术(如课程学习和可学习超参数)。我们广泛的实证评估表明,这些要素的精心组合能够取得有希望的结果。我们的仿真环境和训练基线已开源,以促进对这一开放问题的进一步研究,并鼓励该领域的合作。