While recent advancements in generative AI have substantially accelerated static 3D model creation workflows, the synthesis of category-agnostic 3D animations remains a significant bottleneck in 3D asset production. Current methods for category-agnostic animation generation exhibit critical limitations in inference speed, motion quality, and adherence to textual prompts, thereby leaving the process dependent on labor-intensive manual artistry. To address these challenges, this paper introduces AnimaSpark, a novel pipeline for category-agnostic 3D animation generation. Our approach is motivated by the key insight that for many fundamental motions in the 3D world, the corresponding joint transformations can often be effectively modeled within a two-dimensional subspace. The pipeline begins by rendering a rigged static 3D model into multi-layered image representations of its mesh and skeleton, which are subsequently fed into a video generation model. We then employ a keypoint tracking algorithm on the generated video to capture the motion of the skeletal joints projected onto the camera's viewing plane. In the final stage, we distill the planar translations and rotations from these tracked keypoints and lift them from the 2D domain into 3D space to animate the character. Comprehensive evaluations reveal that our method achieves superior performance over existing state-of-the-art techniques across key metrics, including text-motion alignment, quality of motion, and computational efficiency.
翻译:尽管生成式人工智能的最新进展大幅加速了静态三维模型的创建流程,但类别无关的三维动画合成仍是三维资产生产中的一个关键瓶颈。现有类别无关动画生成方法在推理速度、运动质量及文本提示遵循性方面存在严重局限,导致该流程仍依赖劳动密集型的人工创作。为应对这些挑战,本文提出AnimaSpark——一种用于类别无关三维动画生成的新型流水线。该方法的核心理念在于:对于三维世界中的许多基础运动,其对应的关节变换通常可在二维子空间内有效建模。该流水线首先将带有骨骼绑定的静态三维模型渲染为多层图像表征(包含网格与骨骼),随后将其输入视频生成模型。我们采用关键点追踪算法从生成的视频中捕捉骨骼关节在相机视平面上的投影运动。在最后阶段,我们从追踪的关键点中提取平面平移与旋转信息,并将其从二维域提升至三维空间以驱动角色动画。全面评估表明,本方法在文本-运动对齐、运动质量及计算效率等关键指标上均超越现有最优技术。