Fine-grained 3D object detection is a core ability for agents to understand their 3D environment and interact with surrounding objects. However, current methods and benchmarks mainly focus on relatively large stuff. 3D object detectors still struggle on small objects due to weak geometric information. With in-depth study, we find increasing the spatial resolution of the feature maps significantly boosts the performance of 3D small object detection. And more interestingly, though the computational overhead increases dramatically with resolution, the growth mainly comes from the upsampling operation of the decoder. Inspired by this, we present a high-resolution multi-level detector with dynamic spatial pruning named DSPDet3D, which detects objects from large to small by iterative upsampling and meanwhile prunes the spatial representation of the scene at regions where there is no smaller object to be detected in higher resolution. We organize two benchmarks on ScanNet and TO-SCENE dataset to evaluate the ability of fine-grained 3D object detection, where our DSPDet3D improves the detection performance of small objects to a new level while achieving leading inference speed compared with existing 3D object detection methods. Moreover, DSPDet3D trained with only ScanNet rooms can generalize well to scenes in larger scale. It takes less than 2s for DSPDet3D to directly process a whole house or building consisting of dozens of rooms while detecting out almost all objects, ranging from bottles to beds, on a single RTX 3090 GPU. Project page: https://xuxw98.github.io/DSPDet3D/.
翻译:细粒度三维目标检测是智能体理解三维环境并与周围物体交互的核心能力。然而,当前方法和基准主要聚焦于相对较大的物体。由于几何信息薄弱,三维目标检测器在小物体上仍面临挑战。通过深入研究,我们发现提升特征图的空间分辨率能显著增强三维小目标检测性能。更有趣的是,尽管分辨率提升会大幅增加计算开销,但计算量的增长主要来源于解码器的上采样操作。受此启发,我们提出一种名为DSPDet3D的高分辨率多层级检测器,其通过迭代上采样实现从大到小的目标检测,同时在高分辨率下对无更小待检测物体的场景区域进行空间表征剪枝。我们在ScanNet和TO-SCENE数据集上构建了两个基准以评估细粒度三维目标检测能力,实验表明DSPDet3D将小目标检测性能提升至新高度,同时相较于现有三维目标检测方法实现了领先的推理速度。此外,仅使用ScanNet房间数据训练的DSPDet3D能很好地泛化至更大尺度场景。在单张RTX 3090 GPU上,DSPDet3D处理包含数十个房间的整栋房屋或建筑仅需不到2秒,即可检测出从瓶子到床铺的几乎所有物体。项目页面:https://xuxw98.github.io/DSPDet3D/。