面向自动驾驶的三维语义场景补全：基于可变形大核注意力与Mamba模型的元学习框架 (Towards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model)

Semantic scene completion (SSC) is essential for achieving comprehensive perception in autonomous driving systems. However, existing SSC methods often overlook the high deployment costs in real-world applications. Traditional architectures, such as 3D Convolutional Neural Networks (3D CNNs) and self-attention mechanisms, face challenges in efficiently capturing long-range dependencies within 3D voxel grids, limiting their effectiveness. To address these issues, we introduce MetaSSC, a novel meta-learning-based framework for SSC that leverages deformable convolution, large-kernel attention, and the Mamba (D-LKA-M) model. Our approach begins with a voxel-based semantic segmentation (SS) pretraining task, aimed at exploring the semantics and geometry of incomplete regions while acquiring transferable meta-knowledge. Using simulated cooperative perception datasets, we supervise the perception training of a single vehicle using aggregated sensor data from multiple nearby connected autonomous vehicles (CAVs), generating richer and more comprehensive labels. This meta-knowledge is then adapted to the target domain through a dual-phase training strategy that does not add extra model parameters, enabling efficient deployment. To further enhance the model's capability in capturing long-sequence relationships within 3D voxel grids, we integrate Mamba blocks with deformable convolution and large-kernel attention into the backbone network. Extensive experiments demonstrate that MetaSSC achieves state-of-the-art performance, significantly outperforming competing models while also reducing deployment costs.

翻译：语义场景补全（SSC）对于实现自动驾驶系统的全面感知至关重要。然而，现有SSC方法往往忽视了实际应用中的高部署成本。传统架构，如三维卷积神经网络（3D CNN）和自注意力机制，在高效捕获三维体素网格内的长程依赖关系方面面临挑战，限制了其有效性。为解决这些问题，我们提出了MetaSSC，一种新颖的基于元学习的SSC框架，它融合了可变形卷积、大核注意力以及Mamba（D-LKA-M）模型。我们的方法始于一个基于体素的语义分割（SS）预训练任务，旨在探索不完整区域的语义与几何特性，同时获取可迁移的元知识。利用模拟协同感知数据集，我们通过聚合来自附近多辆网联自动驾驶车辆（CAV）的传感器数据来监督单车的感知训练，从而生成更丰富、更全面的标签。随后，通过一种不增加额外模型参数的双阶段训练策略，将此元知识适配到目标域，实现高效部署。为进一步增强模型在三维体素网格内捕获长序列关系的能力，我们将Mamba模块与可变形卷积及大核注意力集成到骨干网络中。大量实验表明，MetaSSC实现了最先进的性能，显著优于竞争模型，同时降低了部署成本。