Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.
翻译:将2D大核卷积的成功推广至3D感知面临两大挑战:1)处理3D数据时开销呈立方级增长;2)数据稀疏性及稀缺性导致的优化困难。现有研究通过引入块共享权重,首次将核尺寸从3×3×3扩展至7×7×7。然而为降低块内特征差异,该方法仅采用适度块大小,未能实现如21×21×21的更大核。为此,我们提出新型方法LinK,通过两项核心设计以卷积方式实现更广域感知感受野:其一,采用线性核生成器替代静态核矩阵,仅对非空体素自适应提供权重;其二,复用重叠块中预计算的聚合结果以降低计算复杂度。该方法成功使每个体素可感知21×21×21范围内的上下文信息。在3D目标检测与3D语义分割两项基础感知任务上的大量实验验证了本方法有效性。值得关注的是,通过将LinK主干网络简单集成至基础检测器CenterPoint,我们在nuScenes 3D检测基准(LiDAR赛道)公开排行榜中位列第一。同时,本方法将SemanticKITTI测试集上强分割基线的mIoU提升2.7%。代码开源于https://github.com/MCG-NJU/LinK。