The field of aerial manipulation has seen rapid advances, transitioning from push-and-slide tasks to interaction with articulated objects. So far, when more complex actions are performed, the motion trajectory is usually handcrafted or a result of online optimization methods like Model Predictive Control (MPC) or Model Predictive Path Integral (MPPI) control. However, these methods rely on heuristics or model simplifications to efficiently run on onboard hardware, producing results in acceptable amounts of time. Moreover, they can be sensitive to disturbances and differences between the real environment and its simulated counterpart. In this work, we propose a Reinforcement Learning (RL) approach to learn motion behaviors for a manipulation task while producing policies that are robust to disturbances and modeling errors. Specifically, we train a policy to perform a door-opening task with an Omnidirectional Micro Aerial Vehicle (OMAV). The policy is trained in a physics simulator and experiments are presented both in simulation and running onboard the real platform, investigating the simulation to real world transfer. We compare our method against a state-of-the-art MPPI solution, showing a considerable increase in robustness and speed.
翻译:空中操纵领域近年来取得了快速进展,从推拉式任务过渡到与铰接物体的交互。目前,在执行更复杂的动作时,运动轨迹通常由人工设计,或是通过模型预测控制(MPC)或模型预测路径积分(MPPI)控制等在线优化方法生成。然而,这些方法依赖启发式规则或模型简化才能在机载硬件上高效运行,并在可接受的时间内产生结果。此外,它们对真实环境与仿真环境之间的干扰和差异较为敏感。在这项工作中,我们提出一种基于强化学习(RL)的方法来学习操纵任务中的运动行为,并生成对干扰和建模误差具有鲁棒性的策略。具体而言,我们训练一个策略来使全向微型飞行器(OMAV)执行开门任务。该策略在物理仿真器中训练,并进行了仿真实验和真实平台上的机载实验,研究了从仿真到真实世界的迁移。我们将我们的方法与最先进的MPPI解决方案进行了对比,结果显示我们的方法在鲁棒性和速度方面均有显著提升。