Text-driven motion editing and intra-structural retargeting, where source and target share topology but may differ in bone lengths, are traditionally handled by fragmented pipelines with incompatible inputs and representations: editing relies on specialized generative steering, while retargeting is deferred to geometric post-processing. We present a unifying perspective where both tasks are cast as instances of conditional transport within a single generative framework. By leveraging recent advances in flow matching, we demonstrate that editing and retargeting are fundamentally the same generative task, distinguished only by which conditioning signal, semantic or structural, is modulated during inference. We implement this vision via a rectified-flow motion model jointly conditioned on text prompts and target skeletal structures. Our architecture extends a DiT-style transformer with per-joint tokenization and explicit joint self-attention to strictly enforce kinematic dependencies, while a multi-condition classifier-free guidance strategy balances text adherence with skeletal conformity. Experiments on SnapMoGen and a multi-character Mixamo subset show that a single trained model supports text-to-motion generation, zero-shot editing, and zero-shot intra-structural retargeting. This unified approach simplifies deployment and improves structural consistency compared to task-specific baselines.
翻译:文本驱动的运动编辑与结构内重定向(即源和目标共享拓扑结构但骨骼长度可能不同)传统上由碎片化流程处理,各流程输入不兼容且表征方式各异:编辑依赖专用生成导向技术,而重定向则推迟至几何后处理阶段。本文提出统一视角,将两项任务均视为单一生成框架内的条件传输实例。通过利用流匹配的最新进展,我们证明编辑与重定向本质上是相同的生成任务,仅通过推理时调节的条件信号类型(语义或结构)加以区分。我们通过联合条件于文本提示与目标骨骼结构的整流流运动模型实现这一构想。架构方面,我们在DiT风格变换器中引入逐关节标记化与显式关节自注意力机制以严格约束运动学依赖关系,同时采用多条件无分类器引导策略平衡文本一致性要求与骨骼约束。在SnapMoGen及多角色Mixamo子集上的实验表明,单一训练模型可支持文本到运动生成、零样本编辑及零样本结构内重定向。相较任务专用基线方法,该统一方案简化部署流程并提升结构一致性。