SEED: A Simple and Effective 3D DETR in Point Clouds

Recently, detection transformers (DETRs) have gradually taken a dominant position in 2D detection thanks to their elegant framework. However, DETR-based detectors for 3D point clouds are still difficult to achieve satisfactory performance. We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution of point clouds; 2) How to implement an effective query interaction by exploiting the rich geometric structure of point clouds is not fully explored. To this end, we propose a simple and effective 3D DETR method (SEED) for detecting 3D objects from point clouds, which involves a dual query selection (DQS) module and a deformable grid attention (DGA) module. More concretely, to obtain appropriate queries, DQS first ensures a high recall to retain a large number of queries by the predicted confidence scores and then further picks out high-quality queries according to the estimated quality scores. DGA uniformly divides each reference box into grids as the reference points and then utilizes the predicted offsets to achieve a flexible receptive field, allowing the network to focus on relevant regions and capture more informative features. Extensive ablation studies on DQS and DGA demonstrate its effectiveness. Furthermore, our SEED achieves state-of-the-art detection performance on both the large-scale Waymo and nuScenes datasets, illustrating the superiority of our proposed method. The code is available at https://github.com/happinesslz/SEED

翻译：近年来，检测Transformer（DETR）凭借其优雅的框架，在二维检测领域逐渐占据主导地位。然而，基于DETR的三维点云检测器仍难以取得令人满意的性能。我们认为主要挑战在于两方面：1）由于点云的高稀疏性和不均匀分布，如何获取合适的物体查询向量具有挑战性；2）如何利用点云丰富的几何结构实现有效的查询交互尚未得到充分探索。为此，我们提出了一种简单有效的三维DETR方法（SEED），用于从点云中检测三维物体，该方法包含双重查询选择（DQS）模块和可变形网格注意力（DGA）模块。具体而言，为获取合适的查询向量，DQS首先通过预测的置信度分数确保高召回率以保留大量查询，随后根据估计的质量分数进一步筛选出高质量查询。DGA将每个参考框均匀划分为网格作为参考点，并利用预测的偏移量实现灵活的感受野，使网络能够聚焦相关区域并捕获更具信息量的特征。对DQS和DGA的大量消融实验证明了其有效性。此外，我们的SEED在大型Waymo和nuScenes数据集上均实现了最先进的检测性能，证明了所提方法的优越性。代码发布于https://github.com/happinesslz/SEED。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日