Articulated 3D object generation is fundamental for creating realistic, functional, and interactable virtual assets which are not simply static. We introduce MeshArt, a hierarchical transformer-based approach to generate articulated 3D meshes with clean, compact geometry, reminiscent of human-crafted 3D models. We approach articulated mesh generation in a part-by-part fashion across two stages. First, we generate a high-level articulation-aware object structure; then, based on this structural information, we synthesize each part's mesh faces. Key to our approach is modeling both articulation structures and part meshes as sequences of quantized triangle embeddings, leading to a unified hierarchical framework with transformers for autoregressive generation. Object part structures are first generated as their bounding primitives and articulation modes; a second transformer, guided by these articulation structures, then generates each part's mesh triangles. To ensure coherency among generated parts, we introduce structure-guided conditioning that also incorporates local part mesh connectivity. MeshArt shows significant improvements over state of the art, with 57.1% improvement in structure coverage and a 209-point improvement in mesh generation FID.
翻译:铰接式三维物体生成对于创建真实、功能化且可交互的虚拟资产至关重要,这些资产并非简单的静态模型。我们提出MeshArt,一种基于分层Transformer的方法,用于生成具有简洁紧凑几何结构的铰接三维网格,其质量可媲美人手制作的三维模型。我们采用分部件两阶段方法实现铰接网格生成:首先生成高层级铰接感知的对象结构;随后基于该结构信息合成各部件的网格面。本方法的核心在于将铰接结构与部件网格统一建模为量化三角形嵌入序列,从而构建基于Transformer的自回归分层生成框架。对象部件结构首先生成为其边界基元与铰接模式;第二个Transformer在铰接结构引导下生成各部件的网格三角形。为确保生成部件间的协调性,我们引入了同时融合局部部件网格连接关系的结构引导条件机制。MeshArt在多项指标上显著超越现有最优方法:结构覆盖率提升57.1%,网格生成FID指标提升209分。