Attention-based Proposals Refinement for 3D Object Detection

Recent advances in 3D object detection is made by developing the refinement stage for voxel-based Region Proposal Networks (RPN) to better strike the balance between accuracy and efficiency. A popular approach among state-of-the-art frameworks is to divide proposals, or Regions of Interest (ROI), into grids and extract feature for each grid location before synthesizing them to form ROI feature. While achieving impressive performances, such an approach involves a number of hand crafted components (e.g. grid sampling, set abstraction) which requires expert knowledge to be tuned correctly. This paper proposes a data-driven approach to ROI feature computing named APRO3D-Net which consists of a voxel-based RPN and a refinement stage made of Vector Attention. Unlike the original multi-head attention, Vector Attention assigns different weights to different channels within a point feature, thus being able to capture a more sophisticated relation between pooled points and ROI. Experiments on KITTI \textit{validation} set show that our method achieves competitive performance of 84.84 AP for class Car at Moderate difficulty while having the least parameters compared to closely related methods and attaining a quasi-real time inference speed at 15 FPS on NVIDIA V100 GPU. The code is released in https://github.com/quan-dao/APRO3D-Net.

翻译：3D物体探测的最近进展是,为更好地平衡准确性和效率,开发了基于Voxel的区域建议网络(RPN)的完善阶段,以更好地平衡准确性和效率。在最先进的框架中,流行的方法是将提案或利益区域(ROI)分割成网格和每个网格位置的提取特征,然后将其合成成ROI特征。虽然取得令人印象深刻的性能,但这种方法涉及手工艺的一些部件(例如网格取样、设置抽象性能),这需要专家知识的正确调整。本文建议对ROI特性计算采用数据驱动法,名为 APRO3D-Net,由基于Voxel的RPN和矢量注意的精细化阶段组成。与最初的多头关注不同,VCentor 注意力对某个点特性的不同渠道给予不同的权重,从而能够捕捉到集合点与ROI的更复杂的关系。KITTI\ texti/Netvalidation}实验表明,我们的方法在MDA类汽车上实现了8484-84 AP的竞争性性性性性工作,同时,在MDIS-VA在最低速度上也很难在15/FPISPI/VSA上取得最接近的进展规则。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【CVPR2022】自动驾驶中的伪双目三维目标检测，Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving

专知会员服务

18+阅读 · 2022年3月19日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

CVPR 2020 论文开源项目合集

专知会员服务

111+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日