We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset. Project Page: https://ttchengab.github.io/3dminerOfficial
翻译:我们提出3DMiner——一种从具有挑战性的大规模无标注图像数据集中挖掘三维形状的流水线。与其他无监督三维重建方法不同,我们假设在足够大的数据集中,必然存在具有相似形状但背景、纹理和视角各异的物体图像。我们的方法利用自监督图像表示学习的最新进展,对几何形状相似的图像进行聚类,并发现它们之间的共同图像对应关系。进而利用这些对应关系获取粗略的相机估计,作为光束法平差的初始化。最后,针对每个图像聚类,我们采用渐进式光束法平差重建方法,学习代表潜在形状的神经占据场。实验表明,该流程对先前步骤引入的多种错误(如错误的相机姿态、包含不同形状的图像等)具有鲁棒性,从而能够为野外图像获取形状和姿态标注。当使用Pix3D椅子数据集中的图像时,我们的方法在定量和定性结果上均显著优于最先进的无监督三维重建技术。此外,我们展示如何将3DMiner应用于野外数据,通过重建LAION-5B数据集中图像包含的形状。项目主页:https://ttchengab.github.io/3dminerOfficial