Previous motion generation methods are limited to the pre-rigged 3D human model, hindering their applications in the animation of various non-rigged characters. In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters. The pivotal innovation in TapMo is its use of shape deformation-aware features as a condition to guide the diffusion model, thereby enabling the generation of mesh-specific motions for various characters. Specifically, TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module. Mesh Handle Predictor predicts the skinning weights and clusters mesh vertices into adaptive handles for deformation control, which eliminates the need for traditional skeletal rigging. Shape-aware Motion Diffusion synthesizes motion with mesh-specific adaptations. This module employs text-guided motions and mesh features extracted during the first stage, preserving the geometric integrity of the animations by accounting for the character's shape and deformation. Trained in a weakly-supervised manner, TapMo can accommodate a multitude of non-human meshes, both with and without associated text motions. We demonstrate the effectiveness and generalizability of TapMo through rigorous qualitative and quantitative experiments. Our results reveal that TapMo consistently outperforms existing auto-animation methods, delivering superior-quality animations for both seen or unseen heterogeneous 3D characters.
翻译:先前的运动生成方法局限于预绑定骨架的3D人体模型,限制了其在各类无骨架角色动画中的应用。本文提出TapMo——一种文本驱动的动画管线,可为广泛的3D无骨架角色合成运动。TapMo的核心创新在于利用形状变形感知特征作为条件引导扩散模型,从而为不同角色生成适配网格的运动。具体而言,TapMo包含两个主要组件:网格手柄预测器与形状感知扩散模块。网格手柄预测器可预测蒙皮权重并将网格顶点聚类为自适应控制手柄以驱动形变,从而免去传统骨骼绑定的需求。形状感知运动扩散模块则通过网格特异性适配合成运动,该模块利用文本引导的运动与第一阶段提取的网格特征,通过考虑角色形状与形变来保持动画的几何完整性。TapMo采用弱监督方式训练,可适配多种非人物体网格,无论其是否关联文本运动。我们通过严格的定性与定量实验验证了TapMo的有效性与泛化能力。结果表明,在已见或未见异构3D角色的动画生成中,TapMo始终优于现有自动动画方法,能生成更高质量的运动序列。