We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
翻译:摘要:本文提出DINAR方法,旨在从单张RGB图像创建逼真的可绑定全身虚拟形象。与先前工作类似,本方法结合神经纹理与SMPL-X身体模型,在保持虚拟形象易于驱动和快速推理的同时实现照片级渲染质量。为恢复纹理,我们采用潜在扩散模型,并展示了如何在神经纹理空间中训练此类模型。利用扩散模型,我们能够从正面视角逼真地重建人体背部等大面积未见区域。本管线中的模型仅使用二维图像和视频进行训练。实验结果表明,本方法在渲染质量上达到当前最优水平,对新姿态和视角具备良好的泛化能力,特别是在SnapshotPeople公开基准测试中相较于现有技术取得显著提升。