Unsupervised 3D object detection methods have emerged to leverage vast amounts of data without requiring manual labels for training. Recent approaches rely on dynamic objects for learning to detect mobile objects but penalize the detections of static instances during training. Multiple rounds of (self) training are used to add detected static instances to the set of training targets; this procedure to improve performance is computationally expensive. To address this, we propose the method UNION. We use spatial clustering and self-supervised scene flow to obtain a set of static and dynamic object proposals from LiDAR. Subsequently, object proposals' visual appearances are encoded to distinguish static objects in the foreground and background by selecting static instances that are visually similar to dynamic objects. As a result, static and dynamic mobile objects are obtained together, and existing detectors can be trained with a single training. In addition, we extend 3D object discovery to detection by using object appearance-based cluster labels as pseudo-class labels for training object classification. We conduct extensive experiments on the nuScenes dataset and increase the state-of-the-art performance for unsupervised 3D object discovery, i.e. UNION more than doubles the average precision to 38.4. The code is available at github.com/TedLentsch/UNION.
翻译:无监督三维物体检测方法已兴起,旨在利用海量数据而无需人工标注进行训练。现有方法依赖动态物体学习检测移动目标,但在训练过程中会对静态实例的检测施加惩罚。为提升性能,通常采用多轮(自)训练将检测到的静态实例加入训练目标集,这一过程计算成本高昂。为此,我们提出UNION方法。我们利用空间聚类与自监督场景流从激光雷达数据中获取静态与动态物体候选区域,随后对物体候选区域的视觉外观进行编码,通过选择与动态物体视觉相似的静态实例来区分前景与背景中的静态物体。由此可同时获得静态与动态移动物体,现有检测器仅需单次训练即可完成学习。此外,我们通过将基于物体外观的聚类标签作为伪类别标签训练物体分类,将三维物体发现任务扩展至检测任务。我们在nuScenes数据集上进行了大量实验,将无监督三维物体发现的最优性能提升至平均精度38.4(UNION方法使该指标提升超两倍)。代码发布于github.com/TedLentsch/UNION。