Common deep learning models for 3D environment perception often use pillarization/voxelization methods to convert point cloud data into pillars/voxels and then process it with a 2D/3D convolutional neural network (CNN). The pioneer work PointNet has been widely applied as a local feature descriptor, a fundamental component in deep learning models for 3D perception, to extract features of a point cloud. This is achieved by using a symmetric max-pooling operator which provides unique pillar/voxel features. However, by ignoring most of the points, the max-pooling operator causes an information loss, which reduces the model performance. To address this issue, we propose a novel local feature descriptor, mini-PointNetPlus, as an alternative for plug-and-play to PointNet. Our basic idea is to separately project the data points to the individual features considered, each leading to a permutation invariant. Thus, the proposed descriptor transforms an unordered point cloud to a stable order. The vanilla PointNet is proved to be a special case of our mini-PointNetPlus. Due to fully utilizing the features by the proposed descriptor, we demonstrate in experiment a considerable performance improvement for 3D perception.
翻译:常见的三维环境感知深度学习模型通常采用柱状化/体素化方法将点云数据转换为柱状体/体素,再通过二维/三维卷积神经网络(CNN)进行处理。作为三维感知深度学习模型的基础组件,先驱性工作PointNet已被广泛用作局部特征描述子来提取点云特征。其实现方式是通过对称最大池化算子生成唯一的柱状体/体素特征。然而,最大池化算子忽略了大部分点云信息,导致信息损失,从而降低模型性能。针对该问题,我们提出一种新型局部特征描述子mini-PointNetPlus,可作为PointNet的即插即用替代方案。基本思路是将数据点分别投影到所考虑的各个特征上,每个特征均能保持置换不变性。因此,所提出的描述子可将无序点云转化为稳定有序结构。理论证明,原始PointNet是mini-PointNetPlus的特例。由于所提出描述子能充分利用特征信息,实验表明该方法在三维感知任务中取得了显著的性能提升。