Animals learn to adapt speed of their movements to their capabilities and the environment they observe. Mobile robots should also demonstrate this ability to trade-off aggressiveness and safety for efficiently accomplishing tasks. The aim of this work is to endow flight vehicles with the ability of speed adaptation in prior unknown and partially observable cluttered environments. We propose a hierarchical learning and planning framework where we utilize both well-established methods of model-based trajectory generation and trial-and-error that comprehensively learns a policy to dynamically configure the speed constraint. Technically, we use online reinforcement learning to obtain the deployable policy. The statistical results in simulation demonstrate the advantages of our method over the constant speed constraint baselines and an alternative method in terms of flight efficiency and safety. In particular, the policy behaves perception awareness, which distinguish it from alternative approaches. By deploying the policy to hardware, we verify that these advantages can be brought to the real world.
翻译:动物能够学会根据自身能力及所观察的环境来调整运动速度。移动机器人也应具备这种在攻击性与安全性之间权衡的能力,以高效完成任务。本研究旨在赋予飞行器在未知且部分可观测的杂乱环境中进行速度适应的能力。我们提出了一种分层学习与规划框架,该框架结合了成熟的基于模型的轨迹生成方法和试错学习机制,从而全面学习能够动态配置速度约束的策略。在技术实现上,我们采用在线强化学习来获得可部署策略。仿真统计结果表明,在飞行效率与安全性方面,我们的方法相较于恒定速度约束基线及另一种对比方法具有显著优势。特别值得注意的是,该策略表现出感知意识特性,这使其区别于其他方法。通过将策略部署到硬件平台,我们验证了这些优势在现实环境中同样得以体现。