Global perception is essential for embodied agents in 360° spaces, yet current affordance grounding remains largely object-centric and restricted to perspective views. To bridge this gap, we introduce a novel task: Holistic Affordance Grounding in 360° Indoor Environments. This task faces unique challenges, including severe geometric distortions from Equirectangular Projection (ERP), semantic dispersion, and cross-scale alignment difficulties. We propose PanoAffordanceNet, an end-to-end framework featuring a Distortion-Aware Spectral Modulator (DASM) for latitude-dependent calibration and an Omni-Spherical Densification Head (OSDH) to restore topological continuity from sparse activations. By integrating multi-level constraints comprising pixel-wise, distributional, and region-text contrastive objectives, our framework effectively suppresses semantic drift under low supervision. Furthermore, we construct 360-AGD, the first high-quality panoramic affordance grounding dataset. Extensive experiments demonstrate that PanoAffordanceNet significantly outperforms existing methods, establishing a solid baseline for scene-level perception in embodied intelligence. The source code and benchmark dataset will be made publicly available at https://github.com/GL-ZHU925/PanoAffordanceNet.
翻译:全局感知对于360°空间中的具身智能体至关重要,然而当前的可供性接地研究仍主要局限于以物体为中心且受限于透视视角。为弥补这一差距,我们提出了一项新任务:360°室内环境中的整体可供性接地。该任务面临独特挑战,包括等距柱状投影(ERP)带来的严重几何畸变、语义分散以及跨尺度对齐困难。我们提出了PanoAffordanceNet——一个端到端框架,其核心包括用于纬度相关校准的畸变感知频谱调制器(DASM),以及从稀疏激活中恢复拓扑连续性的全向球面致密化头部(OSDH)。通过整合包含像素级、分布级和区域-文本对比目标的多层次约束,我们的框架在弱监督条件下有效抑制了语义漂移。此外,我们构建了首个高质量全景可供性接地数据集360-AGD。大量实验表明,PanoAffordanceNet显著优于现有方法,为具身智能中的场景级感知建立了坚实基准。源代码与基准数据集将在 https://github.com/GL-ZHU925/PanoAffordanceNet 公开提供。