In this work, we tackle the task of learning 3D human Gaussians from a single image, focusing on recovering detailed appearance and geometry including unobserved regions. We introduce a single-view generalizable Human Gaussian Model (HGM), which employs a novel generate-then-refine pipeline with the guidance from human body prior and diffusion prior. Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians, then uses the refined image along with the input image to reconstruct refined human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch, propagating image features from the SMPL-X volume to the image Gaussians using sparse convolution and attention mechanisms. Given that the initial SMPL-X estimation might be inaccurate, we gradually refine it with our HGM model. We validate our approach on several publicly available datasets. Our method surpasses previous methods in both novel view synthesis and surface reconstruction. Our approach also exhibits strong generalization for cross-dataset evaluation and in-the-wild images.
翻译:本研究致力于从单视图图像学习三维人体高斯表示,重点恢复包括未观测区域在内的精细外观与几何特征。我们提出了一种单视图可泛化人体高斯模型(HGM),该模型采用创新的"生成-优化"流程,并融合人体先验与扩散先验作为指导。我们的方法首先利用ControlNet对粗粒度预测的人体高斯模型渲染的背视图进行优化,随后结合优化后的图像与输入图像重建精细化的人体高斯表示。为规避可能生成非真实人体姿态与形状的问题,我们引入SMPL-X模型的人体先验作为双分支架构,通过稀疏卷积与注意力机制将SMPL-X体素特征传播至图像高斯表示。考虑到初始SMPL-X估计可能存在误差,我们采用HGM模型对其进行渐进式优化。我们在多个公开数据集上验证了方法的有效性。实验表明,本方法在新视角合成与表面重建任务上均超越现有方法,同时在跨数据集评估与真实场景图像中展现出优异的泛化能力。