Object detection in 3D point clouds is a crucial task in a range of computer vision applications including robotics, autonomous cars, and augmented reality. This work addresses the object detection task in 3D point clouds using a highly efficient, surface-biased, feature extraction method (wang2022rbgnet), that also captures contextual cues on multiple levels. We propose a 3D object detector that extracts accurate feature representations of object candidates and leverages self-attention on point patches, object candidates, and on the global scene in 3D scene. Self-attention is proven to be effective in encoding correlation information in 3D point clouds by (xie2020mlcvnet). While other 3D detectors focus on enhancing point cloud feature extraction by selectively obtaining more meaningful local features (wang2022rbgnet) where contextual information is overlooked. To this end, the proposed architecture uses ray-based surface-biased feature extraction and multi-level context encoding to outperform the state-of-the-art 3D object detector. In this work, 3D detection experiments are performed on scenes from the ScanNet dataset whereby the self-attention modules are introduced one after the other to isolate the effect of self-attention at each level.
翻译:三维点云中的目标检测是包括机器人、自动驾驶汽车和增强现实等一系列计算机视觉应用中的关键任务。本文提出了一种高效、基于表面的特征提取方法(wang2022rbgnet),该方法能在多个层面捕获上下文线索,用于解决三维点云中的目标检测任务。我们提出了一种三维目标检测器,该检测器能提取目标候选区域的精确特征表示,并在三维场景中利用点面片、目标候选区域及全局场景上的自注意力机制。自注意力机制被证明在编码三维点云中的相关性信息方面有效(xie2020mlcvnet)。虽然其他三维检测器侧重于通过选择性获取更有意义的局部特征来增强点云特征提取(wang2022rbgnet),但忽视了上下文信息。为此,所提出的架构采用基于射线的表面偏向特征提取和多级上下文编码,以超越最先进的三维目标检测器。本研究中,在ScanNet数据集的场景上进行了三维检测实验,通过逐一引入自注意力模块,以隔离每个层级上自注意力的影响。