MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting, making a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360-degree views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) higher-quality dense point clouds are reconstructed via advanced methods for objects captured in 360-degree views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public at luyues.github.io/mvimgnet2, including multi-view images of all 520k objects, the reconstructed high-quality point clouds, and data annotation codes, hoping to inspire the broader vision community.
翻译:MVImgNet 是一个大规模数据集,包含约 22 万个现实世界物体在 238 个类别下的多视角图像。作为 ImageNet 的对应物,它通过多视角拍摄引入了 3D 视觉信号,在 2D 与 3D 视觉之间架起了一座软桥梁。本文构建了 MVImgNet2.0 数据集,将 MVImgNet 扩展至总计约 52 万个物体和 515 个类别,由此衍生出一个规模更大、更可与 2D 领域数据集相媲美的 3D 数据集。除了扩展的数据集规模和类别范围外,MVImgNet2.0 凭借四项新特性,其质量高于 MVImgNet:(i)大多数拍摄捕获了物体的 360 度视图,可支持具有完整性的物体重建学习;(ii)分割方式得到改进,能生成精度更高的前景物体掩码;(iii)采用更强大的运动恢复结构方法,为每帧图像估计出误差更低的相机位姿;(iv)通过先进方法为 360 度视图捕获的物体重建了更高质量的密集点云,可用于下游应用。大量实验证实了所提出的 MVImgNet2.0 在提升大型 3D 重建模型性能方面的价值。MVImgNet2.0 将在 luyues.github.io/mvimgnet2 公开,包括所有 52 万个物体的多视角图像、重建的高质量点云以及数据标注代码,以期启发更广泛的视觉社区。