We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds. We design a contrastive loss between features of Lidar scans captured in the same scene. Several such approaches have been proposed in the literature from PointConstrast, which uses a contrast at the level of points, to the state-of-the-art TARL, which uses a contrast at the level of segments, roughly corresponding to objects. While the former enjoys a great simplicity of implementation, it is surpassed by the latter, which however requires a costly pre-processing. In BEVContrast, we define our contrast at the level of 2D cells in the Bird's Eye View plane. Resulting cell-level representations offer a good trade-off between the point-level representations exploited in PointContrast and segment-level representations exploited in TARL: we retain the simplicity of PointContrast (cell representations are cheap to compute) while surpassing the performance of TARL in downstream semantic segmentation.
翻译:我们提出了一种应用于车载激光雷达点云的三维骨干网络自监督方法,其设计出人意料地简洁高效。该方法针对同一场景中采集的激光雷达扫描特征设计了对比损失。现有文献已提出多种类似方法,从在点级别进行对比的PointContrast,到当前最优方法TARL(在近似对应物体的片段级别进行对比)。前者虽实现极其简便,但性能不及需昂贵预处理的后者。在BEVContrast中,我们将对比定义在鸟瞰图平面的二维单元格级别。由此生成的单元格级表征在PointContrast的点级表征与TARL的片段级表征之间实现了良好折中:既保留了PointContrast的简易性(单元格表征计算成本低廉),又在下游语义分割任务中超越了TARL的性能。