Recent 3D generative models can synthesize high-quality assets, but their outputs are typically static: they lack the skeletal rigs, joint hierarchies, and skinning weights required for animation. This limits their use in games, film, simulation, virtual agents, and embodied AI, where assets must not only look plausible but also move plausibly. We introduce Rigel3D, a generative method for animation-ready 3D assets represented as rigged meshes. Unlike post-hoc auto-rigging methods that attach rigs to completed shapes, our method jointly models geometry and rig structure through coupled surface and skeleton structured latent representations. A rig-aware autoencoder decodes these representations into mesh geometry, skeleton topology, joint coordinates, and skinning weights, while a two-stage latent generative model synthesizes both surface and skeleton representations for image-conditioned generation. To support downstream animation workflows, we further introduce an open-vocabulary joint labeling module that embeds generated joints into a shared vision-language space, enabling correspondence to arbitrary retargeting templates. Experiments on large-scale rigged asset datasets demonstrate that our method generates diverse, high-quality animation-ready assets and outperforms existing rigging baselines across multiple metrics.
翻译:近年来三维生成模型虽能合成高质量资产,但其输出通常为静态形式——缺乏动画所需的骨骼绑定、关节层级与蒙皮权重。这限制了它们在游戏、影视、仿真、虚拟代理及具身AI领域的应用,这些场景要求资产既要外观合理,更要运动合理。本文提出Rigel3D,一种以绑定网格形式生成动画就绪三维资产的生成方法。不同于事后自动绑定方法(即为已完成形状附加骨架),本方法通过耦合表面与骨架的结构化潜在表示,联合建模几何与骨架结构。体感知自编码器将这些表示解码为网格几何、骨架拓扑、关节坐标与蒙皮权重,而两阶段潜在生成模型合成表面与骨架表示以实现图像条件生成。为支撑下游动画工作流,我们进一步引入开放词汇的关节标注模块,将生成的关节嵌入共享视觉语言空间,从而实现与任意重定向模板的对应。在大规模绑定资产数据集上的实验表明,本方法能生成多样且高质量的动画就绪资产,并在多项指标上超越现有绑定基线方法。