Articulated 3D reconstruction has valuable applications in various domains, yet it remains costly and demands intensive work from domain experts. Recent advancements in template-free learning methods show promising results with monocular videos. Nevertheless, these approaches necessitate a comprehensive coverage of all viewpoints of the subject in the input video, thus limiting their applicability to casually captured videos from online sources. In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete. We propose DreaMo that jointly performs shape reconstruction while solving the challenging low-coverage regions with view-conditioned diffusion prior and several tailored regularizations. In addition, we introduce a skeleton generation strategy to create human-interpretable skeletons from the learned neural bones and skinning weights. We conduct our study on a self-collected internet video collection characterized by incomplete view coverage. DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation. Extensive qualitative and quantitative studies validate the efficacy of each proposed component, and show existing methods are unable to solve correct geometry due to the incomplete view coverage.
翻译:铰接式三维重建在多个领域具有重要应用价值,但至今仍成本高昂且需领域专家密集参与。近期无模板学习方法的进展在单目视频上展现出可喜成果。然而,这些方法要求输入视频全面覆盖目标主体所有视角,因此难以应用于来自网络来源的随意拍摄视频。本研究探讨如何从单段随意拍摄的网络视频(主体视角覆盖不完整)中重建铰接式三维形状。我们提出DreaMo方法,在联合执行形状重建的同时,通过视角条件扩散先验与多种定制正则化手段,解决具有挑战性的低覆盖区域问题。此外,我们引入骨架生成策略,从学习到的神经骨骼与蒙皮权重中生成可解释的人体骨架。我们在自收集的视角覆盖不完整网络视频数据集上开展研究。DreaMo在新视角渲染、精细铰接式形状重建及骨架生成方面展现出优异质量。大量定性与定量研究验证了各提出模块的有效性,并表明现有方法因视角覆盖不完整而无法求解正确几何结构。