This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose the first animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. Our body is represented by a set of gaussian primitives in a canonical space which are deformed in a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (\OURS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method presents a PSNR 1.5dbB better than the state-of-the-art on THuman4 dataset while being able to render at 20fps or more.
翻译:本文解决了从多视角视频中学习逼真人体化身并实现实时渲染的问题。虽然经典的虚拟人体建模与渲染方法通常使用纹理网格,但近年来的研究已开发出具有惊人视觉质量的神经人体表示方法。然而,这些模型难以实时渲染,且当角色以不同于训练样本的人体姿态进行动画化时,其质量会下降。我们提出了首个基于3D高斯泼溅的可动画化人体模型——该技术近期作为神经辐射场的高效替代方案问世。我们的身体由规范空间中的一组高斯基元表示,通过结合前向蒙皮与局部非刚性优化的由粗到精方法进行变形。我们描述了如何从多视角观测中以端到端方式学习人类高斯泼溅模型,并在 clothed 人体新姿态合成任务上将其与现有最优方法进行对比。在THuman4数据集上,我们的方法在保持20帧/秒及以上渲染速度的同时,峰值信噪比(PSNR)比当前最优方法高出1.5分贝。