Autonomous FPV quadrotor flight in complex environments using a monocular RGB camera as the sole exteroceptive sensor remains a fundamental challenge. Recent research has shown that using optical flow as the input of a neural network can achieve end-to-end autonomous flight in cluttered scenes. However, extracting the most relevant information from the flow estimation is the key bottleneck limiting agility and robustness. Existing methods struggle to disentangle obstacle-induced optical flow from the ego-motion background flow and suffer from low signal-to-noise ratios near the focus of expansion (FoE). To address these issues, we decompose the optical flow into translational and rotational components and utilize only the translational flow, which captures scene geometry and depth cues. In addition, we introduce an uncertainty mask derived from inconsistencies between forward and backward flow estimates. This mask highlights obstacle structures, including those within the FoE region. Both cues are fed to a control policy trained in a differentiable simulation framework, which enables efficient first-order optimization across perception and control. We validate our approach through extensive experiments in both simulated and real-world forest environments. The proposed system achieves robust flight at speeds of up to 13.91 m/s in simulation and 11.79 m/s in real-world tests, with a 93.3\% success rate over 30 real-world trials, nearly doubling the previously reported 6 m/s real-world speed of the monocular-RGB optical-flow UAV obstacle avoidance system.
翻译:复杂环境中仅以单目RGB摄像头作为唯一外部感知传感器的自主FPV四旋翼飞行器飞行仍是一项根本性挑战。近期研究表明,利用光流作为神经网络输入可在杂乱场景中实现端到端自主飞行。然而,从光流估计中提取最相关信息是制约敏捷性与鲁棒性的关键瓶颈。现有方法难以区分障碍物诱发光流与自运动背景光流,且在膨胀中心(FoE)附近存在信噪比过低的问题。针对这些问题,我们将光流分解为平移分量和旋转分量,仅利用能捕捉场景几何与深度线索的平移光流。此外,我们提出了一种基于前向与后向光流估计不一致性的不确定性掩模,该掩模能突出障碍物结构(包括FoE区域内的结构)。两种线索被输入至可微分仿真框架中训练的控制策略,从而实现感知与控制间高效的一阶优化。我们在模拟与真实森林环境中通过大量实验验证了该方法的效果。所提系统在模拟环境中实现最高13.91米/秒的稳健飞行,在真实测试中达到11.79米/秒,30次真实试验的成功率达93.3%,几乎将此前报道的6米/秒单目RGB光流无人机避障系统真实飞行速度提升一倍。