Visual control enables quadrotors to adaptively navigate using real-time sensory data, bridging perception with action. Yet, challenges persist, including generalization across scenarios, maintaining reliability, and ensuring real-time responsiveness. This paper introduces a perception framework grounded in foundation models for universal object detection and tracking, moving beyond specific training categories. Integral to our approach is a multi-layered tracker integrated with the foundation detector, ensuring continuous target visibility, even when faced with motion blur, abrupt light shifts, and occlusions. Complementing this, we introduce a model-free controller tailored for resilient quadrotor visual tracking. Our system operates efficiently on limited hardware, relying solely on an onboard camera and an inertial measurement unit. Through extensive validation in diverse challenging indoor and outdoor environments, we demonstrate our system's effectiveness and adaptability. In conclusion, our research represents a step forward in quadrotor visual tracking, moving from task-specific methods to more versatile and adaptable operations.
翻译:视觉控制使四旋翼飞行器能够利用实时传感数据进行自适应导航,从而将感知与行动相连接。然而,挑战依然存在,包括跨场景泛化、保持可靠性以及确保实时响应。本文引入了一个基于基础模型的感知框架,用于通用物体检测与跟踪,超越了特定训练类别的限制。我们方法的核心是一个与基础检测器集成的多层跟踪器,即使在面临运动模糊、光照突变和遮挡时,也能确保目标的持续可见性。作为补充,我们引入了一种专为弹性四旋翼视觉跟踪设计的无模型控制器。我们的系统仅依赖机载摄像头和惯性测量单元,即可在有限硬件上高效运行。通过在多样化且具有挑战性的室内外环境中进行广泛验证,我们证明了系统的有效性和适应性。总之,本研究代表了四旋翼视觉跟踪领域的一次进步,从特定任务方法迈向更加通用和自适应的操作方式。