We present AHOY, a method for reconstructing complete, animatable 3D Gaussian avatars from in-the-wild monocular video despite heavy occlusion. Existing methods assume unoccluded input-a fully visible subject, often in a canonical pose-excluding the vast majority of real-world footage where people are routinely occluded by furniture, objects, or other people. Reconstructing from such footage poses fundamental challenges: large body regions may never be observed, and multi-view supervision per pose is unavailable. We address these challenges with four contributions: (i) a hallucination-as-supervision pipeline that uses identity-finetuned diffusion models to generate dense supervision for previously unobserved body regions; (ii) a two-stage canonical-to-pose-dependent architecture that bootstraps from sparse observations to full pose-dependent Gaussian maps; (iii) a map-pose/LBS-pose decoupling that absorbs multi-view inconsistencies from the generated data; (iv) a head/body split supervision strategy that preserves facial identity. We evaluate on YouTube videos and on multi-view capture data with significant occlusion and demonstrate state-of-the-art reconstruction quality. We also demonstrate that the resulting avatars are robust enough to be animated with novel poses and composited into 3DGS scenes captured using cell-phone video. Our project page is available at https://miraymen.github.io/ahoy/
翻译:我们提出AHOY方法,用于从存在严重遮挡的非受控单目视频中重建完整且可动画化的3D高斯化身。现有方法假设输入无遮挡——即目标完全可见且通常处于标准姿态——这排除了大多数人日常被家具、物体或他人遮挡的真实场景视频。从这类视频重建面临根本性挑战:大面积身体区域可能从未被观测到,且每种姿态缺乏多视角监督。我们通过四项创新应对这些挑战:(i) 基于幻觉监督的流水线,利用身份微调扩散模型生成先前未观测身体区域的密集监督信号;(ii) 两阶段标准姿态到姿态依赖架构,从稀疏观测引导生成完整姿态依赖高斯图;(iii) 地图姿态/LBS姿态解耦策略,吸收生成数据中的多视角不一致性;(iv) 头身分离监督策略,保持面部身份特征。我们在YouTube视频及存在显著遮挡的多视角采集数据上进行评估,证明了最先进的重建质量。同时验证了所获化身对新颖姿态的鲁棒动画能力,以及可将其合成至手机视频拍摄的3DGS场景中。项目页面详见https://miraymen.github.io/ahoy/