Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can effortlessly estimate the body geometry and imagine full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. Taking advantage of the CLIP models, ELICIT can use text descriptions to generate text-conditioned unseen regions. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available. The code is public for research purposes at https://elicit3d.github.io/
翻译:现有用于创建人体化身神经渲染方法通常需要密集输入信号(如视频或多视角图像),或利用大规模特定3D人体数据集的学习先验实现稀疏输入重建。大多数方法在仅提供单张图像时无法实现逼真重建。为实现高数据效率的逼真可动画化3D人体创建,我们提出ELICIT——一种从单张图像学习人体特定神经辐射场的新方法。受人类能轻易从单张图像估计身体几何并想象全身衣着的启发,ELICIT利用两种先验:3D几何先验与视觉语义先验。具体而言,ELICIT采用基于蒙皮顶点模板模型(SMPL)的3D体型几何先验,并通过CLIP预训练模型实现视觉衣物语义先验。两种先验联合指导优化,以生成不可见区域中合理的内容。借助CLIP模型,ELICIT可结合文本描述生成文本条件驱动的不可见区域。为进一步提升视觉细节,我们提出基于分割的采样策略,对化身不同部位进行局部优化。在ZJU-MoCAP、Human3.6M和DeepFashion等多个主流基准上的综合评估表明,仅需单张图像时,ELICIT已超越强基线化身创建方法。研究用代码已开源至https://elicit3d.github.io/