We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns the underlying object geometry and parts decomposition in an entirely unsupervised manner. This allows it to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. We primarily evaluate the framework on two video datasets: VoxCeleb $256^2$ and TEDXPeople $256^2$. In addition, on the Cats $256^2$ image dataset, we show it even learns compelling 3D geometry from still images. Finally, we show our model can obtain animatable 3D objects from a single or few images. Code and visual results available on our project website, see https://snap-research.github.io/unsupervised-volumetric-animation .
翻译:我们提出了一种新颖的无监督三维动画方法,用于处理非刚性可变形物体。我们的方法仅从单视角RGB视频中学习物体的三维结构和动力学特性,并将其分解为可追踪和动画化的语义有意义部件。通过采用三维自动解码器框架,并结合可微PnP算法的关键点估计器,我们的模型以完全无监督的方式学习底层物体几何结构和部件分解。这使得模型能够执行三维分割、三维关键点估计、新视角合成及动画生成。我们主要在两个视频数据集VoxCeleb $256^2$和TEDXPeople $256^2$上评估了该框架。此外,在Cats $256^2$图像数据集上,我们证明该方法甚至可以从静态图像中学习出令人信服的三维几何结构。最后,我们展示了模型能够从单张或少量图像中获取可动画的三维物体。代码及可视化结果详见项目网站:https://snap-research.github.io/unsupervised-volumetric-animation。