The choice of data representation is a key factor in the success of deep learning in geometric tasks. For instance, DUSt3R has recently introduced the concept of viewpoint-invariant point maps, generalizing depth prediction, and showing that one can reduce all the key problems in the 3D reconstruction of static scenes to predicting such point maps. In this paper, we develop an analogous concept for a very different problem, namely, the reconstruction of the 3D shape and pose of deformable objects. To this end, we introduce the Dual Point Maps (DualPM), where a pair of point maps is extracted from the same image, one associating pixels to their 3D locations on the object, and the other to a canonical version of the object at rest pose. We also extend point maps to amodal reconstruction, seeing through self-occlusions to obtain the complete shape of the object. We show that 3D reconstruction and 3D pose estimation reduce to the prediction of the DualPMs. We demonstrate empirically that this representation is a good target for a deep network to predict; specifically, we consider modeling horses, showing that DualPMs can be trained purely on 3D synthetic data, consisting of a single model of a horse, while generalizing very well to real images. With this, we improve by a large margin previous methods for the 3D analysis and reconstruction of this type of objects.
翻译:数据表示的选择是深度学习在几何任务中取得成功的关键因素。例如,DUSt3R 近期引入了视角不变点图的概念,推广了深度预测,并表明可以将静态场景三维重建中的所有关键问题简化为预测此类点图。在本文中,我们针对一个截然不同的问题——可变形物体的三维形状与姿态重建——提出了一个类似的概念。为此,我们引入了双重点图(DualPM),其中从同一图像中提取一对点图:一个将像素关联到物体上的三维位置,另一个将像素关联到处于静止姿态的物体规范版本。我们还将点图扩展到非模态重建,通过自遮挡观察以获取物体的完整形状。我们证明,三维重建与三维姿态估计可简化为对 DualPM 的预测。我们通过实验证明,该表示是深度网络预测的良好目标;具体而言,我们以马的建模为例,表明 DualPM 可以仅使用三维合成数据进行训练(数据仅包含单一马模型),同时能很好地泛化到真实图像。基于此,我们在该类物体的三维分析与重建方法上取得了显著优于先前方法的性能提升。