Rigged 3D assets are fundamental to 3D deformation and animation. However, existing 3D generation methods face challenges in generating animatable geometry, while rigging techniques lack fine-grained structural control over skeleton creation. To address these limitations, we introduce Stroke3D, a novel framework that directly generates rigged meshes from user inputs: 2D drawn strokes and a descriptive text prompt. Our approach pioneers a two-stage pipeline that separates the generation into: 1) Controllable Skeleton Generation, we employ the Skeletal Graph VAE (Sk-VAE) to encode the skeleton's graph structure into a latent space, where the Skeletal Graph DiT (Sk-DiT) generates a skeletal embedding. The generation process is conditioned on both the text for semantics and the 2D strokes for explicit structural control, with the VAE's decoder reconstructing the final high-quality 3D skeleton; and 2) Enhanced Mesh Synthesis via TextuRig and SKA-DPO, where we then synthesize a textured mesh conditioned on the generated skeleton. For this stage, we first enhance an existing skeleton-to-mesh model by augmenting its training data with TextuRig: a dataset of textured and rigged meshes with captions, curated from Objaverse-XL. Additionally, we employ a preference optimization strategy, SKA-DPO, guided by a skeleton-mesh alignment score, to further improve geometric fidelity. Together, our framework enables a more intuitive workflow for creating ready to animate 3D content. To the best of our knowledge, our work is the first to generate rigged 3D meshes conditioned on user-drawn 2D strokes. Extensive experiments demonstrate that Stroke3D produces plausible skeletons and high-quality meshes.
翻译:绑定骨架的3D资产是3D变形与动画的基础。然而,现有的3D生成方法在生成可动画的几何体方面面临挑战,而骨架绑定技术则缺乏对骨架创建的细粒度结构控制。为应对这些局限,我们提出了Stroke3D,这是一个新颖的框架,能够直接从用户输入(2D绘制笔画和描述性文本提示)生成绑定骨架的网格。我们的方法首创了一个两阶段流程,将生成过程分离为:1)可控骨架生成,我们采用骨架图变分自编码器(Sk-VAE)将骨架的图结构编码到潜在空间中,再由骨架图扩散Transformer(Sk-DiT)生成骨架嵌入。该生成过程同时以文本(用于语义)和2D笔画(用于显式结构控制)为条件,并通过VAE的解码器重建最终的高质量3D骨架;以及2)通过TextuRig和SKA-DPO增强的网格合成,在此阶段,我们基于生成的骨架合成一个带纹理的网格。为此,我们首先通过使用TextuRig(一个从Objaverse-XL中整理出的、带有标注的带纹理且绑定骨架的网格数据集)扩充其训练数据,来增强一个现有的骨架到网格模型。此外,我们采用了一种由骨架-网格对齐分数指导的偏好优化策略SKA-DPO,以进一步提升几何保真度。综上,我们的框架为创建可直接动画的3D内容提供了一个更直观的工作流程。据我们所知,我们的工作是首个基于用户绘制的2D笔画生成绑定骨架3D网格的研究。大量实验表明,Stroke3D能够生成合理的骨架和高质量的网格。