We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
翻译:我们提出DINAR方法,旨在从单张RGB图像中生成具备完整骨架绑定的逼真全身化身。与既有研究类似,本方法采用神经纹理结合SMPL-X人体模型,在保持易驱动性与快速推理的同时实现化身照片级真实感。为重建纹理,我们引入潜空间扩散模型,并论证了如何在神经纹理空间中训练该类模型。扩散模型的应用使得仅依据正面视角即可真实重建人物背部等大量不可见区域。我们管线的所有模型仅通过二维图像与视频进行训练。实验表明,本方法在渲染质量上达到当前最优水平,并对新姿态与新视角展现出优异泛化能力。具体而言,本方法在SnapshotPeople公开基准测试上实现了性能领先。