We propose a method to reconstruct high-fidelity human avatars from multi-view video that can run on mobile devices. Many works can model high-quality Gaussian-based full-body avatars from multi-view video. However, these methods require heavy computation to obtain pose-dependent appearance, making deployment on mobile devices very difficult. Recent methods distill from pretrained models and model pose-dependent nonlinear Gaussian attributes by linearly combining global pose features with blendshapes. Although they can run on mobile devices, they suffer some loss of detail. We observe that nearby Gaussians are often highly correlated within a local region of the body, and can be linearly modeled with less error. Therefore, we use local linear blendshapes in small body parts to capture global nonlinear changes of Gaussian attributes. To further reduce computation and model size, we propose to remove blendshapes for Gaussians whose attributes change little, yielding a minimal blendshape representation. Our method is an end-to-end training method without a pretrained model. To make it run on multiple devices, we implement our method using WebGPU. Experiments show that our method can render high-quality human avatars with better details, and can reach 120 FPS at 2K resolution on mobile devices.
翻译:我们提出一种从多视角视频重建高保真人体化身的方法,该方法可运行于移动设备。现有许多工作能够基于高斯函数从多视角视频构建高质量全身化身,但这些方法需要大量计算来获取与姿态相关的外观,导致其在移动设备上的部署极为困难。近期方法通过从预训练模型中蒸馏知识,并利用全局姿态特征与混合形状的线性组合来建模与姿态相关的非线性高斯属性。尽管这些方法能在移动设备上运行,但会损失部分细节。我们观察到,在身体局部区域内,邻近高斯函数往往高度相关,且可以通过线性建模实现更小的误差。因此,我们在小规模身体部位上采用局部线性混合形状,以捕捉高斯属性的全局非线性变化。为进一步降低计算量与模型规模,我们提出移除那些属性变化微小的高斯函数的混合形状,从而得到最小化混合形状表示。我们的方法是一种无需预训练模型的端到端训练方法。为使其能够在多种设备上运行,我们采用WebGPU实现该方法。实验表明,本方法能够渲染出细节更丰富的高质量人体化身,并在移动设备上以2K分辨率达到120帧/秒的运行速度。