Visual control enables quadrotors to adaptively navigate using real-time sensory data, bridging perception with action. Yet, challenges persist, including generalization across scenarios, maintaining reliability, and ensuring real-time responsiveness. This paper introduces a perception framework grounded in foundation models for universal object detection and tracking, moving beyond specific training categories. Integral to our approach is a multi-layered tracker integrated with the foundation detector, ensuring continuous target visibility, even when faced with motion blur, abrupt light shifts, and occlusions. Complementing this, we introduce a model-free controller tailored for resilient quadrotor visual tracking. Our system operates efficiently on limited hardware, relying solely on an onboard camera and an inertial measurement unit. Through extensive validation in diverse challenging indoor and outdoor environments, we demonstrate our system's effectiveness and adaptability. In conclusion, our research represents a step forward in quadrotor visual tracking, moving from task-specific methods to more versatile and adaptable operations.
翻译:视觉控制使四旋翼能够利用实时传感数据自适应导航,实现感知与行动的桥梁。然而,跨场景泛化、保持可靠性及确保实时响应等挑战依然存在。本文提出一种基于基础模型的感知框架,用于通用目标检测与跟踪,突破特定训练类别的限制。该框架的核心是集成基础检测器的多层跟踪器,即使面临运动模糊、光线突变和遮挡,也能确保目标持续可见。作为补充,我们引入一种专为弹性四旋翼视觉跟踪设计的无模型控制器。系统仅依赖机载摄像头和惯性测量单元,即可在有限硬件上高效运行。通过在多样化的室内外挑战性环境中的广泛验证,我们证明了系统的有效性和适应性。综上所述,本研究标志着四旋翼视觉跟踪从任务特定方法向更通用、更灵活的操作模式迈进了一步。