A fully automated object reconstruction pipeline is crucial for digital content creation. While the area of 3D reconstruction has witnessed profound developments, the removal of background to obtain a clean object model still relies on different forms of manual labor, such as bounding box labeling, mask annotations, and mesh manipulations. In this paper, we propose a novel framework named AutoRecon for the automated discovery and reconstruction of an object from multi-view images. We demonstrate that foreground objects can be robustly located and segmented from SfM point clouds by leveraging self-supervised 2D vision transformer features. Then, we reconstruct decomposed neural scene representations with dense supervision provided by the decomposed point clouds, resulting in accurate object reconstruction and segmentation. Experiments on the DTU, BlendedMVS and CO3D-V2 datasets demonstrate the effectiveness and robustness of AutoRecon.
翻译:全自动物体重建流水线对于数字内容创作至关重要。尽管三维重建领域已取得显著进展,但去除背景以获取纯净物体模型仍依赖不同形式的人工操作,如边界框标注、掩码注释和网格处理。本文提出一种名为AutoRecon的新型框架,旨在从多视角图像中自动发现并重建物体。我们证明,通过利用自监督的二维视觉Transformer特征,可以从SfM点云中稳健地定位和分割前景物体。随后,利用分解后的点云提供密集监督,重建分解的神经场景表征,从而实现精确的物体重建与分割。在DTU、BlendedMVS和CO3D-V2数据集上的实验结果表明了AutoRecon的有效性和鲁棒性。