Real-time free-view human rendering from sparse-view RGB inputs is a challenging task due to the sensor scarcity and the tight time budget. To ensure efficiency, recent methods leverage 2D CNNs operating in texture space to learn rendering primitives. However, they either jointly learn geometry and appearance, or completely ignore sparse image information for geometry estimation, significantly harming visual quality and robustness to unseen body poses. To address these issues, we present Double Unprojected Textures, which at the core disentangles coarse geometric deformation estimation from appearance synthesis, enabling robust and photorealistic 4K rendering in real-time. Specifically, we first introduce a novel image-conditioned template deformation network, which estimates the coarse deformation of the human template from a first unprojected texture. This updated geometry is then used to apply a second and more accurate texture unprojection. The resulting texture map has fewer artifacts and better alignment with input views, which benefits our learning of finer-level geometry and appearance represented by Gaussian splats. We validate the effectiveness and efficiency of the proposed method in quantitative and qualitative experiments, which significantly surpasses other state-of-the-art methods.
翻译:从稀疏视角RGB输入实现实时自由视角人体渲染是一项具有挑战性的任务,主要难点在于传感器数据的稀缺性和严格的时间限制。为确保效率,现有方法通常利用在纹理空间操作的2D CNN来学习渲染基元。然而,这些方法要么联合学习几何与外观,要么在几何估计时完全忽略稀疏图像信息,这严重损害了视觉质量以及对未见身体姿态的鲁棒性。为解决这些问题,我们提出了双次反投影纹理方法,其核心思想是将粗略几何形变估计与外观合成解耦,从而实现实时、鲁棒且具有照片级真实感的4K渲染。具体而言,我们首先引入一种新颖的图像条件化模板形变网络,该网络从第一次反投影得到的纹理中估计人体模板的粗略形变。随后,利用更新后的几何进行第二次且更精确的纹理反投影。由此生成的纹理图包含更少的伪影,并与输入视角更好地对齐,这有利于我们学习由高斯泼溅表示的更精细层次的几何与外观。我们通过定量和定性实验验证了所提方法的有效性和效率,其性能显著超越了其他最先进的方法。