AHA!：基于高斯泼溅的多样化场景中人体化身动画生成 (AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting)

We present a novel framework for animating humans in 3D scenes using 3D Gaussian Splatting (3DGS), a neural scene representation that has recently achieved state-of-the-art photorealistic results for novel-view synthesis but remains under-explored for human-scene animation and interaction. Unlike existing animation pipelines that use meshes or point clouds as the underlying 3D representation, our approach introduces the use of 3DGS as the 3D representation for animating humans in scenes. By representing humans and scenes as Gaussians, our approach allows geometry-consistent free-viewpoint rendering of humans interacting with 3D scenes. Our key insight is that rendering can be decoupled from motion synthesis, and each sub-problem can be addressed independently without the need for paired human-scene data. Central to our method is a Gaussian-aligned motion module that synthesizes motion without explicit scene geometry, using opacity-based cues and projected Gaussian structures to guide human placement and pose alignment. To ensure natural interactions, we further propose a human-scene Gaussian refinement optimization that enforces realistic contact and navigation. We evaluate our approach on scenes from Scannet++ and the SuperSplat library, and on avatars reconstructed from sparse and dense multi-view human capture. Finally, we demonstrate that our framework enables novel applications such as geometry-consistent free-viewpoint rendering of edited monocular RGB videos with newly animated humans, showcasing the unique advantages of 3DGS for monocular video-based human animation. To assess the full quality of our results, we encourage readers to view the supplementary material available at https://miraymen.github.io/aha/ .

翻译：我们提出了一种基于3D高斯泼溅（3DGS）的3D场景人体动画生成新框架。3DGS作为一种神经场景表示方法，近期在新视角合成领域取得了最先进的光照真实感效果，但在人-场景动画与交互方面仍待深入探索。与现有采用网格或点云作为底层3D表示的动画流程不同，我们的方法首次将3DGS作为场景中人体动画的3D表示形式。通过将人体和场景表示为高斯分布，我们的方法能够实现人体与3D场景交互的几何一致性自由视点渲染。我们的核心洞见在于：渲染过程可与运动合成解耦，每个子问题均可独立求解而无需配对的人-场景数据。该方法的关键在于高斯对齐运动模块，该模块无需显式场景几何信息即可合成运动，利用基于不透明度的线索和投影高斯结构来指导人体定位与姿态对齐。为确保自然交互，我们进一步提出人-场景高斯精细化优化方法，以增强真实接触与导航效果。我们在Scannet++和SuperSplat库的场景数据上，以及从稀疏/稠密多视角人体捕捉重建的化身数据上评估了该方法。最后，我们展示了该框架能够实现新颖应用，例如对编辑后的单目RGB视频（包含新生成动画人体）进行几何一致性自由视点渲染，这凸显了3DGS在基于单目视频的人体动画生成中的独特优势。为全面评估结果质量，建议读者查阅补充材料：https://miraymen.github.io/aha/。