In the rapidly evolving field of vision-language navigation (VLN), ensuring safety for physical agents remains an open challenge. For a human-in-the-loop language-operated drone to navigate safely, it must understand natural language commands, perceive the environment, and simultaneously avoid hazards in real time. Control Barrier Functions (CBFs) are formal methods that enforce safe operating conditions. Model Predictive Control (MPC) is an optimization framework that plans a sequence of future actions over a prediction horizon, ensuring smooth trajectory tracking while obeying constraints. In this work, we consider a VLN-operated drone platform and enhance its safety by formulating a novel scene-aware CBF that leverages ego-centric observations from a camera which has both Red-Green-Blue as well as Depth (RGB-D) channels. A CBF-less baseline system uses a Vision-Language Encoder with cross-modal attention to convert commands into an ordered sequence of landmarks. An object detection model identifies and verifies these landmarks in the captured images to generate a planned path. To further enhance safety, an Adaptive Safety Margin Algorithm (ASMA) is proposed. ASMA tracks moving objects and performs scene-aware CBF evaluation on-the-fly, which serves as an additional constraint within the MPC framework. By continuously identifying potentially risky observations, the system performs prediction in real time about unsafe conditions and proactively adjusts its control actions to maintain safe navigation throughout the trajectory. Deployed on a Parrot Bebop2 quadrotor in the Gazebo environment using the Robot Operating System (ROS), ASMA achieves 64%-67% increase in success rates with only a slight increase (1.4%-5.8%) in trajectory lengths compared to the baseline CBF-less VLN.
翻译:在快速发展的视觉语言导航(VLN)领域,确保物理智能体的安全仍然是一个开放的挑战。对于一个由人类在环路中通过语言操作的无人机而言,要安全导航,它必须理解自然语言指令、感知环境,并同时实时规避危险。控制屏障函数(CBFs)是强制执行安全操作条件的正式方法。模型预测控制(MPC)是一种在预测时域内规划未来一系列动作的优化框架,在遵守约束的同时确保平滑的轨迹跟踪。在本工作中,我们考虑一个VLN操作的无人机平台,并通过构建一种新颖的场景感知CBF来增强其安全性,该CBF利用了具有红绿蓝和深度(RGB-D)通道的相机提供的以自我为中心的观测。一个无CBF的基线系统使用具有跨模态注意力的视觉语言编码器将命令转换为有序的地标序列。一个物体检测模型在捕获的图像中识别并验证这些地标以生成规划路径。为了进一步增强安全性,本文提出了自适应安全裕度算法(ASMA)。ASMA跟踪移动物体并实时执行场景感知CBF评估,这作为MPC框架内的一个附加约束。通过持续识别潜在风险的观测,系统实时预测不安全状况,并主动调整其控制动作,以在整个轨迹中维持安全导航。在Gazebo环境中使用机器人操作系统(ROS)部署于Parrot Bebop2四旋翼飞行器上,ASMA与无CBF的基线VLN系统相比,成功率提高了64%-67%,而轨迹长度仅略有增加(1.4%-5.8%)。