Panoramic imagery provides holistic 360° visual coverage for perception in quadruped robots. However, existing occupancy prediction methods are mainly designed for wheeled autonomous driving and rely heavily on RGB cues, limiting their robustness in complex environments. To bridge this gap, (1) we present PanoMMOcc, the first real-world panoramic multimodal occupancy dataset for quadruped robots, featuring four sensing modalities across diverse scenes. (2) We propose a panoramic multimodal occupancy perception framework, VoxelHound, tailored for legged mobility and spherical imaging. Specifically, we design (i) a Vertical Jitter Compensation (VJC) module to mitigate severe viewpoint perturbations caused by body pitch and roll during mobility, enabling more consistent spatial reasoning, and (ii) an effective Multimodal Information Prompt Fusion (MIPF) module that jointly leverages panoramic visual cues and auxiliary modalities to enhance volumetric occupancy prediction. (3) We establish a benchmark based on PanoMMOcc and provide detailed data analysis to enable systematic evaluation of perception methods under challenging embodied scenarios. Extensive experiments demonstrate that VoxelHound achieves state-of-the-art performance on PanoMMOcc (+4.16%} in mIoU). The dataset and code will be publicly released to facilitate future research on panoramic multimodal 3D perception for embodied robotic systems at https://github.com/SXDR/PanoMMOcc, along with the calibration tools released at https://github.com/losehu/CameraLiDAR-Calib.
翻译:全景成像为四足机器人的感知提供了360度全景视觉覆盖。然而,现有的占据预测方法主要面向轮式自动驾驶设计,且严重依赖RGB视觉线索,在复杂环境中的鲁棒性受限。为填补这一空白,(1)我们提出了PanoMMOcc——首个面向四足机器人的真实世界全景多模态占据数据集,涵盖多样化场景下的四种传感模态。(2)我们提出了专为足式移动与球面成像设计的全景多模态占据感知框架VoxelHound。具体而言,我们设计了(i)垂直抖动补偿模块以缓解运动过程中由机体俯仰和横滚引起的剧烈视点扰动,从而实现更一致的空间推理;(ii)高效的多模态信息提示融合模块,联合利用全景视觉线索与辅助模态以增强体素占据预测。(3)基于PanoMMOcc建立了基准测试体系,并通过详细的数据分析为具身挑战场景下的感知方法系统评估提供支持。大量实验表明,VoxelHound在PanoMMOcc上实现了最先进的性能(mIoU提升+4.16%)。数据集与代码将通过https://github.com/SXDR/PanoMMOcc公开,标定工具发布于https://github.com/losehu/CameraLiDAR-Calib,以促进具身机器人系统全景多模态3D感知的未来研究。