Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
翻译:尽管高保真人体重建技术近期取得了进展,但密集采集图像的需求或耗时的逐实例优化过程严重限制了其在更广泛场景中的应用。为解决这些问题,我们提出了HumanSplat,该方法能够以通用化的方式从单张输入图像预测任意人体的3D高斯泼溅属性。具体而言,HumanSplat包含一个2D多视角扩散模型和一个融合人体结构先验的隐空间重建Transformer,该框架巧妙地将几何先验与语义特征整合在统一架构中。我们进一步设计了融合人体语义信息的分层损失函数,以实现高保真纹理建模并更好地约束估计的多视角图像。在标准基准数据集和真实场景图像上的综合实验表明,HumanSplat在实现照片级真实感新视角合成方面超越了现有最先进方法。