Autonomous robots are often employed for data collection due to their efficiency and low labour costs. A key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations given platform-specific resource constraints, such as limited battery life. Adaptive online path planning in 3D environments is challenging due to the large set of valid actions and the presence of unknown occlusions. To address these issues, we propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments. A key aspect of our approach is a dynamically constructed graph that restricts planning actions local to the robot, allowing us to react to newly discovered static obstacles and targets of interest. For replanning, we propose a new reward function that balances between exploring the unknown environment and exploiting online-discovered targets of interest. Our experiments show that our method enables more efficient target discovery compared to state-of-the-art learning and non-learning baselines. We also showcase our approach for orchard monitoring using an unmanned aerial vehicle in a photorealistic simulator. We open-source our code and model at: https://github.com/dmar-bonn/ipp-rl-3d.
翻译:自主机器人因其高效性和低劳动力成本常被用于数据采集。机器人数据采集中的关键任务是在初始未知环境中规划路径以收集观测数据,同时需考虑平台特定的资源限制,如有限的电池续航。在三维环境中进行自适应在线路径规划具有挑战性,原因在于有效动作集合庞大且存在未知遮挡。为解决这些问题,我们提出了一种新颖的深度强化学习方法,用于在未知三维环境中自适应地重新规划机器人路径以绘制感兴趣目标。该方法的核心是动态构建的图结构,该图将规划动作限制在机器人局部范围内,从而能够对新发现的静态障碍物和感兴趣目标作出响应。针对重新规划,我们提出了一种新的奖励函数,用于平衡探索未知环境与利用在线发现的感兴趣目标之间的关系。实验表明,与当前最先进的基于学习及非学习的基线方法相比,我们的方法能够实现更高效的目标发现。我们还在逼真模拟器中展示了使用无人飞行器进行果园监测的应用案例。我们在 https://github.com/dmar-bonn/ipp-rl-3d 开源了代码和模型。