Stroke3D：通过潜在扩散模型将2D笔画提升为绑定骨架的3D模型 (Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models)

Rigged 3D assets are fundamental to 3D deformation and animation. However, existing 3D generation methods face challenges in generating animatable geometry, while rigging techniques lack fine-grained structural control over skeleton creation. To address these limitations, we introduce Stroke3D, a novel framework that directly generates rigged meshes from user inputs: 2D drawn strokes and a descriptive text prompt. Our approach pioneers a two-stage pipeline that separates the generation into: 1) Controllable Skeleton Generation, we employ the Skeletal Graph VAE (Sk-VAE) to encode the skeleton's graph structure into a latent space, where the Skeletal Graph DiT (Sk-DiT) generates a skeletal embedding. The generation process is conditioned on both the text for semantics and the 2D strokes for explicit structural control, with the VAE's decoder reconstructing the final high-quality 3D skeleton; and 2) Enhanced Mesh Synthesis via TextuRig and SKA-DPO, where we then synthesize a textured mesh conditioned on the generated skeleton. For this stage, we first enhance an existing skeleton-to-mesh model by augmenting its training data with TextuRig: a dataset of textured and rigged meshes with captions, curated from Objaverse-XL. Additionally, we employ a preference optimization strategy, SKA-DPO, guided by a skeleton-mesh alignment score, to further improve geometric fidelity. Together, our framework enables a more intuitive workflow for creating ready to animate 3D content. To the best of our knowledge, our work is the first to generate rigged 3D meshes conditioned on user-drawn 2D strokes. Extensive experiments demonstrate that Stroke3D produces plausible skeletons and high-quality meshes.

翻译：绑定骨架的3D资产是3D变形与动画的基础。然而，现有的3D生成方法在生成可动画的几何体方面面临挑战，而骨架绑定技术则缺乏对骨架创建的细粒度结构控制。为应对这些局限，我们提出了Stroke3D，这是一个新颖的框架，能够直接从用户输入（2D绘制笔画和描述性文本提示）生成绑定骨架的网格。我们的方法首创了一个两阶段流程，将生成过程分离为：1）可控骨架生成，我们采用骨架图变分自编码器（Sk-VAE）将骨架的图结构编码到潜在空间中，再由骨架图扩散Transformer（Sk-DiT）生成骨架嵌入。该生成过程同时以文本（用于语义）和2D笔画（用于显式结构控制）为条件，并通过VAE的解码器重建最终的高质量3D骨架；以及2）通过TextuRig和SKA-DPO增强的网格合成，在此阶段，我们基于生成的骨架合成一个带纹理的网格。为此，我们首先通过使用TextuRig（一个从Objaverse-XL中整理出的、带有标注的带纹理且绑定骨架的网格数据集）扩充其训练数据，来增强一个现有的骨架到网格模型。此外，我们采用了一种由骨架-网格对齐分数指导的偏好优化策略SKA-DPO，以进一步提升几何保真度。综上，我们的框架为创建可直接动画的3D内容提供了一个更直观的工作流程。据我们所知，我们的工作是首个基于用户绘制的2D笔画生成绑定骨架3D网格的研究。大量实验表明，Stroke3D能够生成合理的骨架和高质量的网格。