In this paper, we propose a new detection framework for 3D small object detection. Although deep learning-based 3D object detection methods have achieved great success in recent years, current methods still struggle on small objects due to weak geometric information. With in-depth study, we find increasing the spatial resolution of the feature maps significantly boosts the performance of 3D small object detection. And more interestingly, though the computational overhead increases dramatically with resolution, the growth mainly comes from the upsampling operation of the decoder. Inspired by this, we present a high-resolution multi-level detector with dynamic spatial pruning named DSPDet3D, which detects objects from large to small by iterative upsampling and meanwhile prunes the spatial representation of the scene at regions where there is no smaller object to be detected in higher resolution. As the 3D detector only needs to predict sparse bounding boxes, pruning a large amount of uninformative features does not degrade the detection performance but significantly reduces the computational cost of upsampling. In this way, our DSPDet3D achieves high accuracy on small object detection while requiring even less memory footprint and inference time. On ScanNet and TO-SCENE dataset, our method improves the detection performance of small objects to a new level while achieving leading inference speed among all mainstream indoor 3D object detection methods.
翻译:本文提出了一种面向三维小目标检测的新型检测框架。尽管基于深度学习的三维目标检测方法近年来取得了巨大成功,但由于几何信息较弱,现有方法在小目标检测上仍存在困难。通过深入研究,我们发现增加特征图的空间分辨率能够显著提升三维小目标检测性能。更有趣的是,虽然计算开销随分辨率急剧增加,但该增长主要源于解码器的上采样操作。受此启发,我们提出了一种名为DSPDet3D的高分辨率多层级检测器,其采用动态空间剪枝策略,通过迭代上采样从大到小检测目标,同时在高分辨率下不存在更小待检测目标的区域对场景的空间表征进行剪枝。由于三维检测器仅需预测稀疏边界框,剪除大量非信息特征不会降低检测性能,但能显著减少上采样的计算成本。通过这种方式,我们的DSPDet3D在小目标检测上实现了高精度,同时所需内存和推理时间更少。在ScanNet和TO-SCENE数据集上,本方法将小目标检测性能提升至新水平,同时在所有主流室内三维目标检测方法中达到领先的推理速度。