We present a method to reconstruct time-consistent human body models from monocular videos, focusing on extremely loose clothing or handheld object interactions. Prior work in human reconstruction is either limited to tight clothing with no object interactions, or requires calibrated multi-view captures or personalized template scans which are costly to collect at scale. Our key insight for high-quality yet flexible reconstruction is the careful combination of generic human priors about articulated body shape (learned from large-scale training data) with video-specific articulated "bag-of-bones" deformation (fit to a single video via test-time optimization). We accomplish this by learning a neural implicit model that disentangles body versus clothing deformations as separate motion model layers. To capture subtle geometry of clothing, we leverage image-based priors such as human body pose, surface normals, and optical flow during optimization. The resulting neural fields can be extracted into time-consistent meshes, or further optimized as explicit 3D Gaussians for high-fidelity interactive rendering. On datasets with highly challenging clothing deformations and object interactions, DressRecon yields higher-fidelity 3D reconstructions than prior art. Project page: https://jefftan969.github.io/dressrecon/
翻译:我们提出了一种从单目视频重建时间一致人体模型的方法,重点关注极度宽松的衣物或手持物体交互。现有的人体重建工作要么局限于无物体交互的紧身衣物,要么需要经过标定的多视角捕获或个性化的模板扫描,这些数据大规模采集成本高昂。我们实现高质量且灵活重建的关键洞见在于,将关于关节化人体形状的通用人体先验(从大规模训练数据中学习)与针对特定视频的关节化“骨骼袋”变形(通过测试时优化适配单个视频)进行精心结合。我们通过学习一个神经隐式模型来实现这一点,该模型将身体变形与衣物变形解耦为独立的运动模型层。为了捕捉衣物的细微几何结构,我们在优化过程中利用了基于图像的先验,如人体姿态、表面法线和光流。所得的神经场可以被提取为时间一致的网格,或进一步优化为显式的3D高斯模型,以实现高保真度的交互式渲染。在具有高度挑战性的衣物变形和物体交互的数据集上,DressRecon相比现有技术能产生更高保真度的3D重建。项目页面:https://jefftan969.github.io/dressrecon/