We propose PoseGaussian, a pose-guided Gaussian Splatting framework for high-fidelity human novel view synthesis. Human body pose serves a dual purpose in our design: as a structural prior, it is fused with a color encoder to refine depth estimation; as a temporal cue, it is processed by a dedicated pose encoder to enhance temporal consistency across frames. These components are integrated into a fully differentiable, end-to-end trainable pipeline. Unlike prior works that use pose only as a condition or for warping, PoseGaussian embeds pose signals into both geometric and temporal stages to improve robustness and generalization. It is specifically designed to address challenges inherent in dynamic human scenes, such as articulated motion and severe self-occlusion. Notably, our framework achieves real-time rendering at 100 FPS, maintaining the efficiency of standard Gaussian Splatting pipelines. We validate our approach on ZJU-MoCap, THuman2.0, and in-house datasets, demonstrating state-of-the-art performance in perceptual quality and structural accuracy (PSNR 30.86, SSIM 0.979, LPIPS 0.028).
翻译:我们提出了PoseGaussian,一种基于姿态引导的高斯泼溅框架,用于实现高保真的人体新视角合成。人体姿态在我们的设计中具有双重作用:作为结构先验,它与颜色编码器融合以优化深度估计;作为时序线索,它通过专用的姿态编码器进行处理以增强帧间的时间一致性。这些组件被集成到一个完全可微分、端到端可训练的流程中。与先前仅将姿态用作条件或用于形变的工作不同,PoseGaussian将姿态信号嵌入到几何和时序两个阶段,以提高鲁棒性和泛化能力。该框架专门针对动态人体场景中固有的挑战(如关节运动和严重自遮挡)而设计。值得注意的是,我们的框架实现了100 FPS的实时渲染,保持了标准高斯泼溅流程的效率。我们在ZJU-MoCap、THuman2.0和内部数据集上验证了我们的方法,在感知质量和结构准确性方面均展示了最先进的性能(PSNR 30.86,SSIM 0.979,LPIPS 0.028)。