DataLadder: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid

Peidong Liu,Yongce Liu,Songyan Guo,Fuyuan Ma,Zhihao Yuan,Ao Li,Zengjue Chen,Wenhao Li,Tianle Zhang,Mingyang Li,Jiale Zhang,Junzhe Xiong,Zhiyuan Xiang,Dafeng Chi,Yuzheng Zhuang,Yihang Li,Qingrong He,Jiaming Liang,Chen Cai,Peng Hao,Mingxi Luo,Song Wang,Junwu Xiong,Ruodai Li,Liyi Luo,Wei Tan,Dongjiang Li,Jiawei Li,Hui Shen,Yicheng Gong,Liang Lin

from arxiv, Project Page: https://joyai-sim.github.io/

Generalist robot policies require trustworthy evaluation and robot-usable training data, but both are difficult to scale with physical robots alone. Real-robot trials and demonstrations remain the most faithful source of deployment signals, yet they are slow, costly, and hard to reproduce. We present DataLadder, a simulation-enabled interconversion toolchain for human-robot aligned model evaluation and data generation, denoted as Robot $\rightleftharpoons$ Simulation $\rightleftharpoons$ Human. On the one hand, the Robot $\rightarrow$ Simulation $\rightarrow$ Human pathway supports human-robot aligned model evaluation by reconstructing real-robot tabletop organization tasks as calibrated digital twins for scalable evaluation, while using human embodied feedback to inspect and refine the naturalness of simulated motions. On the other hand, the Human $\rightarrow$ Simulation $\rightarrow$ Robot pathway supports human-robot aligned data generation: it lifts ego-centric human demonstrations into simulation, checks them under robot physical constraints, and converts them into robot-centered trajectories, annotations, and visual observations. Together, these pathways use the JoySim simulator as both a scalable evaluation layer and a physical consistency filter for robot data generation. We further package the core reconstruction, simulation, rendering, and realism-augmentation modules as cloud services on JD Cloud, turning the system into reusable infrastructure for robot data generation and model evaluation.

翻译：通用型机器人策略需要可信的评估和可用的训练数据，但仅依靠物理机器人难以实现规模化。真实机器人实验和演示仍是部署信号最可靠的来源，但成本高、速度慢且难以复现。我们提出DataLadder——一种面向人-机对齐的模型评估与数据生成的仿真驱动互转换工具链，记为Robot $\rightleftharpoons$ Simulation $\rightleftharpoons$ Human。一方面，Robot $\rightarrow$ Simulation $\rightarrow$ Human路径通过将真实机器人桌面整理任务重建为可标定的数字孪生，实现人机对齐的模型可扩展评估，同时利用人类具身反馈检查并优化仿真动作的自然性。另一方面，Human $\rightarrow$ Simulation $\rightarrow$ Robot路径支持人机对齐的数据生成：将第一人称人类演示迁移至仿真环境，在机器人物理约束下进行校验，并转换为以机器人为中心的轨迹、标注及视觉观测。两条路径共同利用JoySim仿真器作为可扩展评估层和机器人数据生成的物理一致性滤波器。我们进一步将核心重建、仿真、渲染及真实感增强模块打包为京东云上的云服务，使该系统成为机器人数据生成与模型评估的可复用基础设施。