Stylized motion breathes life into characters. However, the fixed skeleton structure and style representation hinder existing data-driven motion synthesis methods from generating stylized motion for various characters. In this work, we propose a generative motion stylization pipeline, named MotionS, for synthesizing diverse and stylized motion on cross-structure characters using cross-modality style prompts. Our key insight is to embed motion style into a cross-modality latent space and perceive the cross-structure skeleton topologies, allowing for motion stylization within a canonical motion space. Specifically, the large-scale Contrastive-Language-Image-Pre-training (CLIP) model is leveraged to construct the cross-modality latent space, enabling flexible style representation within this space. Additionally, two topology-encoded tokens are learned to capture the canonical and specific skeleton topologies, facilitating cross-structure topology shifting. Subsequently, the topology-shifted stylization diffusion is designed to generate motion content for the specific skeleton and stylize it in the shifted canonical motion space using multi-modality style descriptions. Through an extensive set of examples, we demonstrate the flexibility and generalizability of our pipeline across various characters and style descriptions. Qualitative and quantitative experiments underscore the superiority of our pipeline over state-of-the-art methods, consistently delivering high-quality stylized motion across a broad spectrum of skeletal structures.
翻译:运动风格化为角色注入了生命力。然而,固定的骨骼结构和风格表示限制了现有数据驱动的运动合成方法为不同角色生成风格化运动的能力。本文提出了一种名为MotionS的生成式运动风格化流水线,该流水线利用跨模态风格提示,为跨结构角色合成多样化且风格化的运动。我们的核心思想是将运动风格嵌入到一个跨模态潜空间中,并感知跨结构骨架拓扑结构,从而在规范运动空间内实现运动风格化。具体而言,我们利用大规模对比语言-图像预训练(CLIP)模型构建跨模态潜空间,在该空间内实现灵活的风格表示。此外,通过学习两种拓扑编码标记来捕获规范骨架拓扑和特定骨架拓扑,从而促进跨结构拓扑迁移。随后,设计了拓扑迁移的风格化扩散模型,利用多模态风格描述为特定骨架生成运动内容,并在迁移后的规范运动空间内对其进行风格化。通过大量示例,我们展示了该流水线在不同角色和风格描述下的灵活性与泛化能力。定性与定量实验均证实,与现有最先进方法相比,本流水线具有显著优势,能在广泛的骨骼结构范围内持续生成高质量的风格化运动。