Human motion data is inherently rich and complex, containing both semantic content and subtle stylistic features that are challenging to model. We propose a novel method for effective disentanglement of the style and content in human motion data to facilitate style transfer. Our approach is guided by the insight that content corresponds to coarse motion attributes while style captures the finer, expressive details. To model this hierarchy, we employ Residual Vector Quantized Variational Autoencoders (RVQ-VAEs) to learn a coarse-to-fine representation of motion. We further enhance the disentanglement by integrating contrastive learning and a novel information leakage loss with codebook learning to organize the content and the style across different codebooks. We harness this disentangled representation using our simple and effective inference-time technique Quantized Code Swapping, which enables motion style transfer without requiring any fine-tuning for unseen styles. Our framework demonstrates strong versatility across multiple inference applications, including style transfer, style removal, and motion blending.
翻译:人体运动数据本质上是丰富且复杂的,同时包含语义内容和难以建模的微妙风格特征。本文提出一种新颖方法,用于有效解耦人体运动数据中的风格与内容,以促进风格迁移。我们的方法基于以下洞见:内容对应粗粒度的运动属性,而风格则捕捉更精细、更具表现力的细节。为建模此层次结构,我们采用残差向量量化变分自编码器(RVQ-VAE)来学习从粗到细的运动表征。我们进一步通过将对比学习、新颖的信息泄漏损失与码本学习相结合,以在不同码本间组织内容与风格,从而增强解耦效果。利用我们提出的简单高效的推理时技术——量化码交换,我们能够在不针对未见风格进行任何微调的情况下实现运动风格迁移。我们的框架在多种推理应用中展现出强大的通用性,包括风格迁移、风格移除和运动融合。