Articulated 3D reconstruction has valuable applications in various domains, yet it remains costly and demands intensive work from domain experts. Recent advancements in template-free learning methods show promising results with monocular videos. Nevertheless, these approaches necessitate a comprehensive coverage of all viewpoints of the subject in the input video, thus limiting their applicability to casually captured videos from online sources. In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete. We propose DreaMo that jointly performs shape reconstruction while solving the challenging low-coverage regions with view-conditioned diffusion prior and several tailored regularizations. In addition, we introduce a skeleton generation strategy to create human-interpretable skeletons from the learned neural bones and skinning weights. We conduct our study on a self-collected internet video collection characterized by incomplete view coverage. DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation. Extensive qualitative and quantitative studies validate the efficacy of each proposed component, and show existing methods are unable to solve correct geometry due to the incomplete view coverage.
翻译:关节点3D重建在多个领域具有重要应用价值,但成本高昂且需要领域专家的大量投入。近期无模板学习方法在单目视频上展现出可喜成果。然而这些方法要求输入视频完整覆盖目标对象的所有视角,限制了其在网络来源随意拍摄视频中的适用性。本研究针对单个随意拍摄的网络视频(目标视角覆盖不完整)开展关节点3D形状重建研究。我们提出DreaMo方法,该方法联合执行形状重建,通过视角条件扩散先验和若干定制化正则化项解决低覆盖区域这一难题。此外,我们提出骨架生成策略,从学习到的神经骨骼和蒙皮权重中创建可解释的人体骨架。我们在自建的视角覆盖不完整的网络视频数据集上进行研究。DreaMo在新视角渲染、精细关节点形状重建和骨架生成方面展现出令人瞩目的质量。大量定性与定量研究验证了各提出组件的有效性,并表明现有方法因视角覆盖不完整而无法求解正确几何结构。