Autonomous systems need to process large-scale, sparse, and irregular point clouds with limited compute resources. Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective. Although naively enlarging 3D kernel size can enhance performance, it will also lead to a cubically-increasing overhead. Therefore, it is crucial to develop streamlined 3D large kernel designs that eliminate redundant weights and work effectively with larger kernels. In this paper, we propose an efficient and effective Large Sparse Kernel 3D Neural Network (LSK3DNet) that leverages dynamic pruning to amplify the 3D kernel size. Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS). SDS dynamically prunes and regrows volumetric weights from the beginning to learn a large sparse 3D kernel. It not only boosts performance but also significantly reduces model size and computational cost. Moreover, CWS selects the most important channels for 3D convolution during training and subsequently prunes the redundant channels to accelerate inference for 3D vision tasks. We demonstrate the effectiveness of LSK3DNet on three benchmark datasets and five tracks compared with classical models and large kernel designs. Notably, LSK3DNet achieves the state-of-the-art performance on SemanticKITTI (i.e., 75.6% on single-scan and 63.4% on multi-scan), with roughly 40% model size reduction and 60% computing operations reduction compared to the naive large 3D kernel model.
翻译:自主系统需在有限计算资源下处理大规模、稀疏且不规则的点云数据。因此,开发兼具效率与效能的激光雷达感知方法至关重要。虽然简单增大3D卷积核尺寸可提升性能,但会导致计算量呈立方级增长。为此,亟需设计精简的3D大核架构,既能消除冗余权重,又能有效适配更大尺寸的卷积核。本文提出一种高效能的大稀疏核三维神经网络(LSK3DNet),通过动态剪枝策略放大3D核尺寸。该方法包含两个核心模块:空间动态稀疏(SDS)与通道权重选择(CWS)。SDS通过从训练初期动态剪枝与重建体素权重,学习大规模稀疏3D卷积核,不仅提升性能,还显著降低模型体积与计算开销。CWS则在训练过程中筛选3D卷积最重要的通道,并剪除冗余通道,从而加速3D视觉任务的推理。我们在三个基准数据集和五个任务场景中,将LSK3DNet与经典模型及大核设计方案进行对比。值得注意的是,LSK3DNet在SemanticKITTI数据集上取得了最先进性能(单扫描75.6%,多扫描63.4%),与朴素大3D核模型相比,模型体积缩减约40%,计算操作量降低约60%。