Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from sources such as occlusions, manifests differently across robot viewpoints depending on scene structure. Detecting and resolving sources of perceptual uncertainty requires both scene-based contextual reasoning and capability-aware robot allocation. While vision-language models provide strong semantic priors for both, they are computationally prohibitive for onboard inference and lack calibrated uncertainty quantification. We introduce Co-GLANCE, a real-time onboard perception and decision-making system for uncertainty resolution in heterogeneous robot teams. Co-GLANCE distills the semantic reasoning capabilities of a vision-language model into an end-to-end model for occlusion segmentation and robot allocation, eliminating the need for cloud-based inference. To quantify perceptual uncertainty, Co-GLANCE combines conformal prediction with selective abstention to provide statistically valid coverage guarantees for segmentation, robot allocation, and detection outputs. These calibrated uncertainty estimates directly trigger active perception, dispatching the most appropriate robot to acquire informative viewpoints and resolve uncertainty. Across real-world scenarios, Co-GLANCE outperforms cloud-based vision-language model baselines in occlusion segmentation and robot allocation accuracy by 25% and 36%, respectively, while reducing per-frame inference latency 350x. We also release an air-ground dataset for future research. Code, videos, and dataset available at https://co-glance.github.io/ .

翻译：感知不确定性是异构机器人团队在非结构化户外环境中面临的核心挑战——此类环境中，单一视角无法保证可靠的场景理解。由遮挡等因素引发的感知不确定性会因场景结构差异而通过不同机器人视角产生不同表现。检测并消除感知不确定性需要同时具备基于场景的上下文推理与能力感知的机器人调度能力。尽管视觉语言模型为此两者提供了强大的语义先验，但其在机载推理场景中计算开销过高，且缺乏校准后的不确定性量化机制。本文提出Co-GLANCE——面向异构机器人团队不确定性消除的实时机载感知与决策系统。Co-GLANCE将视觉语言模型的语义推理能力蒸馏至用于遮挡分割与机器人调度的端到端模型中，从而消除对云端推理的依赖。为量化感知不确定性，Co-GLANCE将保形预测与选择性弃权机制结合，为分割、机器人调度及检测输出提供具有统计有效性的覆盖保证。这些经校准的不确定性估计可直接触发主动感知：系统将调度最合适的机器人前往采集信息性视角以消除不确定性。在真实场景测试中，Co-GLANCE在遮挡分割与机器人调度准确率上分别比基于云端的视觉语言模型基线方法提升25%与36%，同时将每帧推理延迟降低350倍。我们同时发布了面向未来研究的地空数据集。代码、演示视频及数据集详见https://co-glance.github.io/。