We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors. To prevent ill-posed solutions, we propose a cross-instance consistency loss that exploits disentangled object shape deformation and articulation. This is helped by a new silhouette-based sampling mechanism to enhance viewpoint diversity during training. Our method only requires estimated object silhouettes and relative depth maps from off-the-shelf pre-trained networks during training. At inference time, given a single-view image, it efficiently outputs an explicit mesh representation. We obtain improved qualitative and quantitative results on challenging quadruped animals compared to relevant existing work.
翻译:我们提出SAOR,一种从野外拍摄的单张图像中估计铰接物体三维形状、纹理与视角的新方法。与依赖预定义类别特定三维模板或定制三维骨架的现有方法不同,SAOR通过学习无骨架的部件级模型,从单视角图像集合中推理物体形状的铰接变形,无需任何三维形状先验。为解决病态解问题,我们提出跨实例一致性损失函数,通过解耦物体形状变形与铰接运动来约束学习。该损失函数得益于一种新型基于轮廓的采样机制,用于在训练过程中增强视角多样性。我们的方法仅需从现成预训练网络获取的估计物体轮廓与相对深度图作为训练输入。在推理阶段,给定单视角图像,该方法可高效输出显式网格表征。与相关现有工作相比,我们在具有挑战性的四足动物数据集上获得了更优的定性与定量结果。