We introduce Particulate, a feed-forward model that, given a 3D mesh of an object, infers its articulations, including its 3D parts, their kinematic structure, and the motion constraints. The model is based on a transformer network, the Part Articulation Transformer, which predicts all these parameters for all joints. We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets. During inference, Particulate maps the output of the network back to the input mesh, yielding a fully articulated 3D model in seconds, much faster than prior approaches that require per-object optimization. Particulate also works on AI-generated 3D assets, enabling the generation of articulated 3D objects from a single (real or synthetic) image when combined with an off-the-shelf image-to-3D model. We further introduce a new challenging benchmark for 3D articulation estimation curated from high-quality public 3D assets, and redesign the evaluation protocol to be more consistent with human preferences. Empirically, Particulate significantly outperforms state-of-the-art approaches.
翻译:摘要:我们提出粒子式(Particulate)模型,这是一种前馈式方法,给定物体的三维网格,可推断其关节结构,包括三维零件、运动学结构及运动约束。该模型基于变换器网络——零件关节变换器(Part Articulation Transformer),可预测所有关节的全部参数。我们在公开数据集中多样化的关节化三维资产上对网络进行端到端训练。推理时,粒子式将网络输出映射回输入网格,在数秒内生成完全关节化的三维模型,速度远超需逐物体优化的现有方法。粒子式亦适用于AI生成的三维资产,结合现成的图像转三维模型,即可从单张(真实或合成)图像生成关节化三维物体。我们进一步引入基于高质量公开三维资产的新基准测试,并重新设计更符合人类偏好的评估协议。实验表明,粒子式显著优于当前最优方法。