We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.
翻译:我们提出了一种方法,利用3D高斯泼溅(3DGS)从单目视频创建可动画化的人体化身。现有基于神经辐射场(NeRF)的方法虽能实现高质量的新视角/新姿态图像合成,但通常需要数天训练时间,且推理速度极慢。近期,学界探索了利用快速网格结构进行高效着装化身训练,尽管训练速度极快,但这些方法仅能达到约15 FPS的交互式渲染帧率。本文采用3D高斯泼溅,通过学习非刚性变形网络重建可动画化的着装人体化身,可在30分钟内完成训练,并以实时帧率(50+ FPS)进行渲染。鉴于表征的显式特性,我们进一步对高斯均值向量和协方差矩阵引入等距保形正则化,增强了模型对高度关节化未见姿态的泛化能力。实验结果表明,在基于单目输入创建可动画化身任务中,本方法在训练速度提升400倍、推理速度提升250倍的同时,性能可媲美甚至超越当前最优方法。