In this paper, we propose PixelHuman, a novel human rendering model that generates animatable human scenes from a few images of a person with unseen identity, views, and poses. Previous work have demonstrated reasonable performance in novel view and pose synthesis, but they rely on a large number of images to train and are trained per scene from videos, which requires significant amount of time to produce animatable scenes from unseen human images. Our method differs from existing methods in that it can generalize to any input image for animatable human synthesis. Given a random pose sequence, our method synthesizes each target scene using a neural radiance field that is conditioned on a canonical representation and pose-aware pixel-aligned features, both of which can be obtained through deformation fields learned in a data-driven manner. Our experiments show that our method achieves state-of-the-art performance in multiview and novel pose synthesis from few-shot images.
翻译:本文提出PixelHuman,一种新颖的人体渲染模型,能够从具有未知身份、视角和姿态的少数图像中生成可动画的人体场景。先前的工作在新视角和新姿态合成方面展现了合理性能,但依赖大量图像进行训练,且需逐场景从视频中训练,导致从未见人体图像生成可动画场景耗时显著。我们的方法区别于现有技术之处在于:它可泛化至任意输入图像实现可动画人体合成。给定随机姿态序列,我们通过以规范表示和姿态感知像素对齐特征为条件的神经辐射场合成每个目标场景,这两种特征均可通过数据驱动方式学习的形变场获得。实验表明,我们的方法在少样本图像的多人视角和新姿态合成中达到了最先进性能。