Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quality instance masks from the prompts using the Segment Anything Model (SAM) and transform the remaining problem into predicting 3D shapes from given 2D masks. Due to the ill-posed nature of this problem, it presents a significant challenge as multiple 3D shapes can project into an identical mask. To tackle this issue, we then lift 2D masks to 3D forms and employ gradient descent to adjust their poses and shapes until the projections fit the masks and the surfaces conform to surrounding LiDAR points. Notably, since we do not train on a specific dataset, the SLF auto-labeler does not overfit to biased annotation patterns in the training set as other methods do. Thus, the generalization ability across different datasets improves. Experimental results on the KITTI dataset demonstrate that the SLF auto-labeler produces high-quality bounding box annotations, achieving an [email protected] IoU of nearly 90\%. Detectors trained with the generated pseudo-labels perform nearly as well as those trained with actual ground-truth annotations. Furthermore, the SLF auto-labeler shows promising results in detailed shape predictions, providing a potential alternative for the occupancy annotation of dynamic objects.

翻译：本文提出一种从二维点或框提示自动标注三维物体的算法，特别聚焦于自动驾驶应用。与先前方法不同，我们的自动标注器预测三维形状而非边界框，且无需在特定数据集上进行训练。为实现这一目标，我们提出了分割、提升与拟合（SLF）范式。首先，我们使用Segment Anything Model（SAM）从提示中分割出高质量的实例掩码，将剩余问题转化为从给定二维掩码预测三维形状。由于该问题的不适定性——多个三维形状可能投影为相同掩码——这带来了显著挑战。为解决此问题，我们随后将二维掩码提升至三维形态，并采用梯度下降法调整其姿态与形状，直至投影贴合掩码且表面符合周围LiDAR点云。值得注意的是，由于未在特定数据集上训练，SLF自动标注器不会像其他方法那样过拟合训练集中的有偏标注模式，从而提升了跨数据集的泛化能力。在KITTI数据集上的实验结果表明，SLF自动标注器能生成高质量的边界框标注，其[email protected] IoU达到近90%。使用生成伪标签训练的检测器性能接近使用真实标注训练的检测器。此外，SLF自动标注器在细节形状预测方面展现出良好前景，为动态物体的占据标注提供了潜在替代方案。