Precise aggressive maneuvers with lightweight onboard sensors remains a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems' accessible area by navigating through narrow openings in the environment. Among the most relevant problems, a representative one is aggressive traversal through narrow gaps with quadrotors under SE(3) constraints, which require the quadrotors to leverage a momentary tilted attitude and the asymmetry of the airframe to navigate through gaps. In this paper, we achieve such maneuvers by developing sensorimotor policies directly mapping onboard vision and proprioception into low-level control commands. The policies are trained using reinforcement learning (RL) with end-to-end policy distillation in simulation. We mitigate the fundamental hardness of model-free RL's exploration on the restricted solution space with an initialization strategy leveraging trajectories generated by a model-based planner. Careful sim-to-real design allows the policy to control a quadrotor through narrow gaps with low clearances and high repeatability. For instance, the proposed method enables a quadrotor to navigate a rectangular gap at a 5 cm clearance, tilted at up to 90-degree orientation, without knowledge of the gap's position or orientation. Without training on dynamic gaps, the policy can reactively servo the quadrotor to traverse through a moving gap. The proposed method is also validated by training and deploying policies on challenging tracks of narrow gaps placed closely. The flexibility of the policy learning method is demonstrated by developing policies for geometrically diverse gaps, without relying on manually defined traversal poses and visual features.
翻译:基于轻量级机载传感器实现精准激进机动仍是充分释放无人机机动性的关键瓶颈。此类机动对通过环境中狭窄开口扩展系统可及区域至关重要。在最具代表性的问题中,四旋翼在SE(3)约束下穿越狭窄间隙的激进机动尤为典型,这要求四旋翼利用瞬时倾斜姿态与机身非对称性完成间隙穿越。本文通过开发直接映射机载视觉与本体感知为底层控制指令的传感器运动策略实现此类机动。策略采用强化学习(RL)训练,在仿真中通过端到端策略蒸馏实现。针对无模型强化学习在受限解空间探索的固有难题,我们提出基于模型规划器生成轨迹的初始化策略进行缓解。精心设计的仿真到现实(sim-to-real)方案使策略能够以高重复性控制四旋翼穿越低间隙余量的狭窄通道。例如,所提方法可使四旋翼在5厘米间隙余量下穿越倾角达90度的矩形间隙,且无需获知间隙的位置或方向。即使未在动态间隙上训练,策略仍能响应式伺服四旋翼穿越移动间隙。通过在紧密排列的狭窄间隙挑战赛道上训练与部署策略,进一步验证了所提方法。通过为几何形状各异的间隙开发策略,且无需依赖人工定义的穿越姿态与视觉特征,充分展示了策略学习方法的灵活性。