Recent communities have seen significant progress in building photo-realistic animatable avatars from sparse multi-view videos. However, current workflows struggle to render realistic garment dynamics for loose-fitting characters as they predominantly rely on naked body models for human modeling while leaving the garment part un-modeled. This is mainly due to that the deformations yielded by loose garments are highly non-rigid, and capturing such deformations often requires dense views as supervision. In this paper, we introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos (4-8 in our setting). To allow the capturing and appearance learning of loose garments in such a situation, we employ a virtual bone-based garment rigging model obtained from physics-based simulation data. Such a model allows us to capture and render complex garment dynamics through a set of low-dimensional bone transformations. Technically, we develop a novel method for estimating temporal coherent garment dynamics from a sparse multi-view video. To build a realistic rendering for unseen garment status using coarse estimations, a pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts. At test time, the new garment poses can be captured from unseen situations, derived from a physics-based or neural network-based simulator to drive unseen garment dynamics. To evaluate our approach, we create a multi-view dataset that captures loose-dressed performers with diverse motions. Experiments show that our method is able to render natural garment dynamics that deviate highly from the body and generalize well to both unseen views and poses, surpassing the performance of existing methods. The code and data will be publicly available.
翻译:近年来,学界已从稀疏多视角视频构建逼真的可动画化化身方面取得显著进展。然而,现有流程在渲染宽松服装角色的真实服装动态时仍面临挑战,因为其主要依赖裸体模型进行人体建模,而将服装部分视为未建模区域。这主要源于宽松服装产生的变形具有高度非刚性特征,捕捉此类变形通常需要密集视角作为监督信息。本文提出AniDress——一种利用极稀疏多视角视频(本设置中为4-8个视角)生成可动画化宽松服装人类化身的新方法。为在此条件下实现宽松服装的捕捉与外观学习,我们采用基于物理仿真数据获取的虚拟骨骼服装绑骨模型。该模型通过一组低维骨骼变换,能够捕捉并渲染复杂的服装动态。技术层面,我们开发了一种从稀疏多视角视频中估计时序一致服装动态的新方法。为利用粗估计构建未知服装状态下的逼真渲染,我们引入了一种以身体和服装运动为条件的姿态驱动可变形神经辐射场,实现对两部分运动的显式控制。在测试阶段,新服装姿态可从物理仿真或神经网络仿真器生成的未知场景中获取,进而驱动未见过的服装动态。为评估方法效果,我们构建了包含多种运动模式的宽松着装表演者多视角数据集。实验表明,本方法能够渲染与身体高度偏离的自然服装动态,并对未见视角与姿态均展现出优异泛化能力,性能超越现有方法。代码与数据将公开提供。