Recently, detection transformers (DETRs) have gradually taken a dominant position in 2D detection thanks to their elegant framework. However, DETR-based detectors for 3D point clouds are still difficult to achieve satisfactory performance. We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution of point clouds; 2) How to implement an effective query interaction by exploiting the rich geometric structure of point clouds is not fully explored. To this end, we propose a simple and effective 3D DETR method (SEED) for detecting 3D objects from point clouds, which involves a dual query selection (DQS) module and a deformable grid attention (DGA) module. More concretely, to obtain appropriate queries, DQS first ensures a high recall to retain a large number of queries by the predicted confidence scores and then further picks out high-quality queries according to the estimated quality scores. DGA uniformly divides each reference box into grids as the reference points and then utilizes the predicted offsets to achieve a flexible receptive field, allowing the network to focus on relevant regions and capture more informative features. Extensive ablation studies on DQS and DGA demonstrate its effectiveness. Furthermore, our SEED achieves state-of-the-art detection performance on both the large-scale Waymo and nuScenes datasets, illustrating the superiority of our proposed method. The code is available at https://github.com/happinesslz/SEED
翻译:近年来,检测Transformer(DETR)凭借其优雅的框架,在二维检测领域逐渐占据主导地位。然而,基于DETR的三维点云检测器仍难以取得令人满意的性能。我们认为主要挑战在于两方面:1)由于点云的高稀疏性和不均匀分布,如何获取合适的物体查询向量具有挑战性;2)如何利用点云丰富的几何结构实现有效的查询交互尚未得到充分探索。为此,我们提出了一种简单有效的三维DETR方法(SEED),用于从点云中检测三维物体,该方法包含双重查询选择(DQS)模块和可变形网格注意力(DGA)模块。具体而言,为获取合适的查询向量,DQS首先通过预测的置信度分数确保高召回率以保留大量查询,随后根据估计的质量分数进一步筛选出高质量查询。DGA将每个参考框均匀划分为网格作为参考点,并利用预测的偏移量实现灵活的感受野,使网络能够聚焦相关区域并捕获更具信息量的特征。对DQS和DGA的大量消融实验证明了其有效性。此外,我们的SEED在大型Waymo和nuScenes数据集上均实现了最先进的检测性能,证明了所提方法的优越性。代码发布于https://github.com/happinesslz/SEED。