Autonomous drone racing demands robust real-time localization under extreme conditions: high-speed flight, aggressive maneuvers, and payload-constrained platforms that often rely on a single camera for perception. Existing visual SLAM systems, while effective in general scenarios, struggle with motion blur and feature instability inherent to racing dynamics, and do not exploit the structured nature of racing environments. In this work, we present a dual pose-graph architecture that fuses odometry with semantic detections for robust localization. A temporary graph accumulates multiple gate observations between keyframes and optimizes them into a single refined constraint per landmark, which is then promoted to a persistent main graph. This design preserves the information richness of frequent detections while preventing graph growth from degrading real-time performance. The system is designed to be sensor-agnostic, although in this work we validate it using monocular visual-inertial odometry and visual gate detections. Experimental evaluation on the TII-RATM dataset shows a 56% to 74% reduction in ATE compared to standalone VIO, while an ablation study confirms that the dual-graph architecture achieves 10% to 12% higher accuracy than a single-graph baseline at identical computational cost. Deployment in the A2RL competition demonstrated that the system performs real-time onboard localization during flight, reducing the drift of the odometry baseline by up to 4.2 m per lap.
翻译:自主无人机竞速要求在极端条件下实现鲁棒实时定位:高速飞行、激进机动以及通常仅依赖单目相机的载荷受限平台。现有视觉SLAM系统在通用场景中表现良好,但难以应对竞速动态固有的运动模糊和特征不稳定性,且未充分利用竞速环境的结构化特性。本文提出一种融合里程计与语义检测的双位姿图架构以实现鲁棒定位。临时图累积关键帧间的多组门框观测,并将其优化为每个地标的单一精炼约束,随后将该约束提升至持久主图。该设计既保留了频繁检测的信息丰富性,又防止图增长损害实时性能。系统设计为传感器无关,但本文使用单目视觉惯性里程计与视觉门框检测进行验证。在TII-RATM数据集上的实验显示,与传统VIO相比,ATE降低56%至74%;消融研究证实,在相同计算成本下,双图架构的精度比单图基线高出10%至12%。在A2RL竞赛中的部署表明,该系统在飞行过程中实现实时机载定位,每圈可将里程计基线漂移减少高达4.2米。