Stroke3D：通过潜在扩散模型将二维笔划提升为绑定骨骼的三维模型 (Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models)

Rigged 3D assets are fundamental to 3D deformation and animation. However, existing 3D generation methods face challenges in generating animatable geometry, while rigging techniques lack fine-grained structural control over skeleton creation. To address these limitations, we introduce Stroke3D, a novel framework that directly generates rigged meshes from user inputs: 2D drawn strokes and a descriptive text prompt. Our approach pioneers a two-stage pipeline that separates the generation into: 1) Controllable Skeleton Generation, we employ the Skeletal Graph VAE (Sk-VAE) to encode the skeleton's graph structure into a latent space, where the Skeletal Graph DiT (Sk-DiT) generates a skeletal embedding. The generation process is conditioned on both the text for semantics and the 2D strokes for explicit structural control, with the VAE's decoder reconstructing the final high-quality 3D skeleton; and 2) Enhanced Mesh Synthesis via TextuRig and SKA-DPO, where we then synthesize a textured mesh conditioned on the generated skeleton. For this stage, we first enhance an existing skeleton-to-mesh model by augmenting its training data with TextuRig: a dataset of textured and rigged meshes with captions, curated from Objaverse-XL. Additionally, we employ a preference optimization strategy, SKA-DPO, guided by a skeleton-mesh alignment score, to further improve geometric fidelity. Together, our framework enables a more intuitive workflow for creating ready to animate 3D content. To the best of our knowledge, our work is the first to generate rigged 3D meshes conditioned on user-drawn 2D strokes. Extensive experiments demonstrate that Stroke3D produces plausible skeletons and high-quality meshes.

翻译：绑定骨骼的三维资产是三维形变与动画的基础。然而，现有的三维生成方法在生成可动画的几何体方面面临挑战，而骨骼绑定技术则缺乏对骨骼创建的细粒度结构控制。为了应对这些局限性，我们提出了Stroke3D，这是一个新颖的框架，能够直接从用户输入——二维绘制笔划和描述性文本提示——生成绑定骨骼的网格。我们的方法首创了一个两阶段流程，将生成过程分离为：1）可控骨骼生成：我们采用骨骼图变分自编码器（Sk-VAE）将骨骼的图结构编码到潜在空间中，其中骨骼图扩散变换器（Sk-DiT）生成骨骼嵌入。该生成过程同时以文本（用于语义）和二维笔划（用于显式结构控制）为条件，并通过VAE的解码器重建最终的高质量三维骨骼；以及2）通过TextuRig和SKA-DPO增强网格合成：随后，我们以生成的骨骼为条件合成带纹理的网格。在此阶段，我们首先通过使用TextuRig——一个从Objaverse-XL中整理得到的、带有标注的带纹理且绑定骨骼的网格数据集——来增强其训练数据，从而改进现有的骨骼到网格模型。此外，我们采用了一种由骨骼-网格对齐分数指导的偏好优化策略SKA-DPO，以进一步提升几何保真度。总之，我们的框架为创建可直接动画的三维内容提供了一个更直观的工作流程。据我们所知，我们的工作是首个以用户绘制的二维笔划为条件生成绑定骨骼的三维网格的研究。大量实验表明，Stroke3D能够生成合理的骨骼和高质量的网格。