We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
翻译:本文提出DINAR方法,用于从单张RGB图像创建逼真的可驱动全身虚拟化身。与先前工作类似,本方法采用神经纹理结合SMPL-X人体模型,在保持化身易于驱动和快速推理的同时实现照片级真实感。为修复纹理,我们使用潜在扩散模型,并展示了如何在该神经纹理空间中对这类模型进行训练。扩散模型的应用使我们能够根据正面视角逼真重建大面积未观测区域(如人物背部)。本流水线中的模型仅通过二维图像与视频训练。实验表明,本方法在渲染质量、对新姿态及视角的泛化能力上均达到当前最优水平。特别地,本方法在SnapshotPeople公开基准测试中显著提升了现有最优性能。