This work explores the use of 3D generative models to synthesize training data for 3D vision tasks. The key requirements of the generative models are that the generated data should be photorealistic to match the real-world scenarios, and the corresponding 3D attributes should be aligned with given sampling labels. However, we find that the recent NeRF-based 3D GANs hardly meet the above requirements due to their designed generation pipeline and the lack of explicit 3D supervision. In this work, we propose Lift3D, an inverted 2D-to-3D generation framework to achieve the data generation objectives. Lift3D has several merits compared to prior methods: (1) Unlike previous 3D GANs that the output resolution is fixed after training, Lift3D can generalize to any camera intrinsic with higher resolution and photorealistic output. (2) By lifting well-disentangled 2D GAN to 3D object NeRF, Lift3D provides explicit 3D information of generated objects, thus offering accurate 3D annotations for downstream tasks. We evaluate the effectiveness of our framework by augmenting autonomous driving datasets. Experimental results demonstrate that our data generation framework can effectively improve the performance of 3D object detectors. Project page: https://len-li.github.io/lift3d-web.
翻译:本文探索了利用三维生成模型为三维视觉任务合成训练数据的方法。生成模型的关键要求是:生成数据应具有照片级真实感以匹配真实场景,且对应的三维属性需与给定的采样标签对齐。然而,我们发现近期基于NeRF的三维GAN因其设计生成流程及缺乏显式三维监督,难以满足上述要求。为此,本文提出Lift3D——一种逆向的二维到三维生成框架,以实现数据生成目标。与先前方法相比,Lift3D具有以下优势:(1)不同于训练后输出分辨率固定的以往三维GAN,Lift3D可泛化至任意相机内参,生成更高分辨率与照片级真实感输出;(2)通过将解耦良好的二维GAN提升为三维物体NeRF,Lift3D提供生成物体的显式三维信息,从而为下游任务提供精确的三维标注。我们通过增强自动驾驶数据集评估了该框架的有效性。实验结果表明,我们的数据生成框架能有效提升三维目标检测器的性能。项目页面:https://len-li.github.io/lift3d-web。