Existing one-shot 4D head synthesis methods usually learn from monocular videos with the aid of 3DMM reconstruction, yet the latter is evenly challenging which restricts them from reasonable 4D head synthesis. We present a method to learn one-shot 4D head synthesis via large-scale synthetic data. The key is to first learn a part-wise 4D generative model from monocular images via adversarial learning, to synthesize multi-view images of diverse identities and full motions as training data; then leverage a transformer-based animatable triplane reconstructor to learn 4D head reconstruction using the synthetic data. A novel learning strategy is enforced to enhance the generalizability to real images by disentangling the learning process of 3D reconstruction and reenactment. Experiments demonstrate our superiority over the prior art.
翻译:摘要:现有单次4D头部合成方法通常借助3DMM重建从单目视频中学习,然而3DMM重建本身极具挑战性,这限制了合理4D头部合成的实现。我们提出一种通过大规模合成数据学习单次4D头部合成的方法。关键在于:首先通过对抗学习从单目图像中学习部件级4D生成模型,以合成包含多样身份与完整动作的多视角图像作为训练数据;随后利用基于Transformer的可动画三平面重建器,从合成数据中学习4D头部重建。我们采用新颖的学习策略,通过解耦3D重建与重演的学习过程,增强对真实图像的泛化能力。实验表明,本方法优于现有技术。