Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into simulation-ready and semantically annotated 3D asset, enabling large-scale robotic manipulation data generation. Using this pipeline, we construct ManiTwin-100K, a dataset containing 100K high-quality annotated 3D assets. Each asset is equipped with physical properties, language descriptions, functional annotations, and verified manipulation proposals. Experiments demonstrate that ManiTwin provides an efficient asset synthesis and annotation workflow, and that ManiTwin-100K offers high-quality and diverse assets for manipulation data generation, random scene synthesis, and VQA data generation, establishing a strong foundation for scalable simulation data synthesis and policy learning. Our webpage is available at https://manitwin.github.io/.
翻译:仿真学习为扩展机器人操作能力提供了有益基础。然而,该范式通常在数据生成就绪的数字资产方面存在规模与多样性的不足。本工作提出ManiTwin,一个用于生成数据生成就绪数字物体孪生的自动化高效流程。我们的流程将单张图像转化为仿真就绪且带有语义标注的三维资产,从而支持大规模机器人操作数据生成。利用该流程,我们构建了包含10万个高质量标注三维资产的ManiTwin-100K数据集。每个资产均配备物理属性、语言描述、功能标注及经过验证的操作方案。实验表明,ManiTwin提供了高效的资产合成与标注工作流,且ManiTwin-100K为操作数据生成、随机场景合成及VQA数据生成提供了高质量多样化资产,为可扩展的仿真数据合成与策略学习奠定了坚实基础。项目网页详见 https://manitwin.github.io/。