An Identity-Preserved Framework for Human Motion Transfer

Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion. Although previous methods have achieved good results in synthesizing good-quality videos, they lose sight of individualized motion information from the source and target motions, which is significant for the realism of the motion in the generated video. To address this problem, we propose a novel identity-preserved HMT network, termed \textit{IDPres}. This network is a skeleton-based approach that uniquely incorporates the target's individualized motion and skeleton information to augment identity representations. This integration significantly enhances the realism of movements in the generated videos. Our method focuses on the fine-grained disentanglement and synthesis of motion. To improve the representation learning capability in latent space and facilitate the training of \textit{IDPres}, we introduce three training schemes. These schemes enable \textit{IDPres} to concurrently disentangle different representations and accurately control them, ensuring the synthesis of ideal motions. To evaluate the proportion of individualized motion information in the generated video, we are the first to introduce a new quantitative metric called Identity Score (\textit{ID-Score}), motivated by the success of gait recognition methods in capturing identity information. Moreover, we collect an identity-motion paired dataset, $Dancer101$, consisting of solo-dance videos of 101 subjects from the public domain, providing a benchmark to prompt the development of HMT methods. Extensive experiments demonstrate that the proposed \textit{IDPres} method surpasses existing state-of-the-art techniques in terms of reconstruction accuracy, realistic motion, and identity preservation.

翻译：人体运动迁移旨在通过模仿源对象的动作，为目标主体生成视频片段。尽管现有方法在合成高质量视频方面取得了良好效果，但忽略了源动作和目标动作中的个性化运动信息，而这些信息对生成视频中动作的真实性至关重要。针对这一问题，我们提出了一种新颖的身份保留人体运动迁移网络，称为\textit{IDPres}。该网络基于骨架信息，独特地融合了目标的个性化运动与骨架特征以增强身份表征，从而显著提升了生成视频中动作的真实感。我们的方法侧重于运动信息的细粒度解耦与合成。为提升潜在空间的表征学习能力并促进\textit{IDPres}的训练，我们引入了三种训练策略。这些策略使\textit{IDPres}能够同时解耦不同表征并精确控制它们，从而确保理想动作的合成。为量化生成视频中个性化运动信息的比例，我们受步态识别方法在捕捉身份信息方面成功经验的启发，首次提出了名为身份得分的新定量指标。此外，我们收集了一个身份-运动配对数据集$Dancer101$，该数据集包含来自公共领域的101位主体的独舞视频，为促进人体运动迁移方法的发展提供了基准。大量实验表明，所提出的\textit{IDPres}方法在重建精度、动作真实性和身份保留方面均超越了现有最先进技术。