We present a novel method for reconstructing clothed humans from a sparse set of, e.g., 1 to 6 RGB images. Despite impressive results from recent works employing deep implicit representation, we revisit the volumetric approach and demonstrate that better performance can be achieved with proper system design. The volumetric representation offers significant advantages in leveraging 3D spatial context through 3D convolutions, and the notorious quantization error is largely negligible with a reasonably large yet affordable volume resolution, e.g., 512. To handle memory and computation costs, we propose a sophisticated coarse-to-fine strategy with voxel culling and subspace sparse convolution. Our method starts with a discretized visual hull to compute a coarse shape and then focuses on a narrow band nearby the coarse shape for refinement. Once the shape is reconstructed, we adopt an image-based rendering approach, which computes the colors of surface points by blending input images with learned weights. Extensive experimental results show that our method significantly reduces the mean point-to-surface (P2S) precision of state-of-the-art methods by more than 50% to achieve approximately 2mm accuracy with a 512 volume resolution. Additionally, images rendered from our textured model achieve a higher peak signal-to-noise ratio (PSNR) compared to state-of-the-art methods.
翻译:我们提出了一种新颖的方法,能够从稀疏的RGB图像(例如1至6张)中重建穿衣人体。尽管近期采用深度隐式表示的工作取得了令人瞩目的成果,但我们重新审视了体积方法,并证明通过恰当的系统设计可以实现更优的性能。体积表示通过3D卷积在利用三维空间上下文方面具有显著优势,而臭名昭著的量化误差在采用合理大且可负担的体积分辨率(例如512)时基本可忽略。为应对内存和计算成本,我们提出了一种精细的由粗到精策略,结合体素剔除与子空间稀疏卷积。该方法首先基于离散化视觉外壳计算粗略形状,随后专注于粗略形状附近的窄带区域进行细化。形状重建完成后,我们采用基于图像的渲染方法,通过将输入图像与学习到的权重进行混合来计算表面点的颜色。大量实验结果表明,我们的方法将现有最优方法的平均点到表面(P2S)精度显著降低超过50%,在512体积分辨率下达到约2毫米的精度。此外,从我们的纹理模型渲染出的图像相比现有最优方法获得了更高的峰值信噪比(PSNR)。