A bottleneck in learning to understand articulated 3D objects is the lack of large and diverse datasets. In this paper, we propose to leverage large language models (LLMs) to close this gap and generate articulated assets at scale. We reduce the problem of generating an articulated 3D asset to that of writing a program that builds it. We then introduce a new agentic system, Articraft, that writes such programs automatically. We design a programmatic interface and harness to help the LLM do so effectively. The LLM writes code against a domain-specific SDK for defining parts, composing geometry, specifying joints, and writing tests to validate the resulting assets. The harness exposes a restricted workspace and interface to the LLM, validates the resulting assets, and returns structured feedback. In this way, the LLM is not distracted by details such as authoring a URDF file or managing a complex software environment. We show that this produces higher-quality assets than both state-of-the-art articulated-asset generators and general-purpose coding agents. Using Articraft, we build Articraft-10K, a curated dataset of over 10K articulated assets spanning 245 categories, and show its utility both for training models of articulated assets and in downstream applications such as robotics simulation and virtual reality.
翻译:学习理解铰接式三维物体的瓶颈在于缺乏大规模且多样化的数据集。本文提出利用大语言模型弥补这一不足,实现铰接式资产的规模化生成。我们将生成铰接式三维资产的问题简化为编写构建该资产的程序,并引入新型智能体系统Articraft来自动编写此类程序。我们设计了程序化接口与工具框架,协助大语言模型高效完成此任务:大语言模型基于领域特定软件开发工具包编写代码,用于定义部件、组合几何体、指定关节,并编写测试以验证生成的资产。该工具框架为大语言模型提供受限工作空间与接口,验证生成资产并返回结构化反馈。通过这种方式,大语言模型无需关注诸如编写URDF文件或管理复杂软件环境等细节。实验表明,该方法生成的资产质量优于现有最先进的铰接式资产生成器及通用编程智能体。利用Articraft,我们构建了涵盖245个类别、超过10K铰接式资产的精选数据集Articraft-10K,并验证其在铰接式资产模型训练及机器人仿真、虚拟现实等下游应用中的实用性。