Indoor monocular semantic scene completion (MSSC) is notably more challenging than its outdoor counterpart due to complex spatial layouts and severe occlusions. While transformers are well suited for modeling global dependencies, their high memory cost and difficulty in reconstructing fine-grained details have limited their use in indoor MSSC. To address these limitations, we introduce AdaSFormer, a serialized transformer framework tailored for indoor MSSC. Our model features three key designs: (1) an Adaptive Serialized Transformer with learnable shifts that dynamically adjust receptive fields; (2) a Center-Relative Positional Encoding that captures spatial information richness; and (3) a Convolution-Modulated Layer Normalization that bridges heterogeneous representations between convolutional and transformer features. Extensive experiments on NYUv2 and Occ-ScanNet demonstrate that AdaSFormer achieves state-of-the-art performance. The code is publicly available at: https://github.com/alanWXZ/AdaSFormer.
翻译:室内单目语义场景补全(MSSC)由于空间布局复杂且遮挡严重,其难度显著高于室外场景。尽管Transformer在建模全局依赖关系方面具有优势,但其高内存成本以及难以重建精细细节的问题限制了其在室内MSSC中的应用。为解决这些局限,我们提出了AdaSFormer,一种专为室内MSSC设计的序列化Transformer框架。该模型包含三个关键设计:(1)自适应序列化Transformer,通过可学习偏移动态调整感受野;(2)中心相关位置编码,捕捉空间信息的丰富性;(3)卷积调制层归一化,桥接卷积特征与Transformer特征之间的异质表示。在NYUv2和Occ-ScanNet上的大量实验表明,AdaSFormer取得了最先进的性能。代码已公开于:https://github.com/alanWXZ/AdaSFormer。