Parameter-Free Adaptive Multi-Scale Channel-Spatial Attention Aggregation framework for 3D Indoor Semantic Scene Completion Toward Assisting Visually Impaired

翻译：面向视障辅助的无参数自适应多尺度通道-空间注意力聚合三维室内语义场景补全框架

Qi He,XiangXiang Wang,Jingtao Zhang,Yongbin Yu,Hongxiang Chu,Manping Fan,JingYe Cai,Zhenglin Yang

from arxiv, 17 pages, 9 figures, 5 tables

In indoor assistive perception for visually impaired users, 3D Semantic Scene Completion (SSC) is expected to provide structurally coherent and semantically consistent occupancy under strictly monocular vision for safety-critical scene understanding. However, existing monocular SSC approaches often lack explicit modeling of voxel-feature reliability and regulated cross-scale information propagation during 2D-3D projection and multi-scale fusion, making them vulnerable to projection diffusion and feature entanglement and thus limiting structural stability.To address these challenges, this paper presents an Adaptive Multi-scale Attention Aggregation (AMAA) framework built upon the MonoScene pipeline. Rather than introducing a heavier backbone, AMAA focuses on reliability-oriented feature regulation within a monocular SSC framework. Specifically, lifted voxel features are jointly calibrated in semantic and spatial dimensions through parallel channel-spatial attention aggregation, while multi-scale encoder-decoder fusion is stabilized via a hierarchical adaptive feature-gating strategy that regulates information injection across scales.Experiments on the NYUv2 benchmark demonstrate consistent improvements over MonoScene without significantly increasing system complexity: AMAA achieves 27.25% SSC mIoU (+0.31) and 43.10% SC IoU (+0.59). In addition, system-level deployment on an NVIDIA Jetson platform verifies that the complete AMAA framework can be executed stably on embedded hardware. Overall, AMAA improves monocular SSC quality and provides a reliable and deployable perception framework for indoor assistive systems targeting visually impaired users.

翻译：在面向视障用户的室内辅助感知中，三维语义场景补全（SSC）被期望在严格单目视觉条件下，为安全关键场景理解提供结构连贯且语义一致的占据信息。然而，现有单目SSC方法在2D-3D投影与多尺度融合过程中，往往缺乏对体素特征可靠性的显式建模及受调控的跨尺度信息传播，使其易受投影扩散与特征纠缠影响，从而限制结构稳定性。为应对这些挑战，本文提出一种基于MonoScene流程构建的自适应多尺度注意力聚合（AMAA）框架。AMAA不引入更复杂的主干网络，而是聚焦于单目SSC框架内面向可靠性的特征调控。具体而言，通过并行通道-空间注意力聚合对提升后的体素特征进行语义与空间维度的联合校准，同时采用分层自适应特征门控策略调控跨尺度信息注入，从而稳定多尺度编码器-解码器融合。在NYUv2基准测试上的实验表明，AMAA在未显著增加系统复杂度的前提下持续优于MonoScene：其SSC mIoU达到27.25%（提升0.31），SC IoU达到43.10%（提升0.59）。此外，在NVIDIA Jetson平台上的系统级部署验证了完整AMAA框架可在嵌入式硬件上稳定运行。总体而言，AMAA提升了单目SSC的质量，并为面向视障用户的室内辅助系统提供了一个可靠且可部署的感知框架。