Humanoid foundation models are advancing faster than we can evaluate them. While real-world testing is expensive and difficult to reproduce, existing simulation benchmarks focus primarily on table-top or wheeled robots. A scalable and reproducible benchmark for whole-body humanoid loco-manipulation remains an open problem. To this end, we present SIMPLE, a unified simulation testbed for humanoid policy learning and evaluation. SIMPLE couples the accurate contact-rich dynamics of MuJoCo with the photorealistic rendering of IsaacSim. It provides a large-scale environment comprising 60 diverse whole-body tasks, 50 indoor scenes, and over 1,000 object assets. To facilitate scalable data collection, the framework integrates two data generation pipelines: automated trajectory generation via motion planning and a low-latency VR teleoperation interface. We further integrate and benchmark mainstream humanoid policies at scale in SIMPLE, including lightweight imitation networks, large vision-language-action (VLA) models, and recent world action models (WAMs). Our experiments reveal a strong correlation between policy performance in simulation and the real world. Furthermore, we demonstrate that policies trained on data collected in SIMPLE can be transferred zero-shot to physical humanoid robots under similar settings, providing a robust and reproducible foundation for humanoid robotics research.
翻译:人形机器人基础模型的发展速度已超越我们对其的评估能力。真实世界测试成本高昂且难以复现,而现有仿真基准主要聚焦于桌面式或轮式机器人。面向全身人形机器人操控的可扩展、可复现基准研究仍是开放问题。为此,我们提出SIMPLE——一个统一的人形机器人策略学习与评估仿真测试平台。SIMPLE将MuJoCo的精确接触动力学模型与IsaacSim的光照逼真渲染相耦合,构建包含60项多样化全身任务、50个室内场景及超1000个物体资产的大规模环境。为实现可扩展数据采集,该框架整合两条数据生成流水线:基于运动规划的自动化轨迹生成与低延迟VR遥操作接口。我们进一步在SIMPLE中大规模集成并基准测试主流人形机器人策略,包括轻量级模仿学习网络、大型视觉-语言-动作(VLA)模型及近期世界动作模型(WAM)。实验揭示仿真与真实世界策略性能之间存在强相关性。此外,我们证明在SIMPLE采集数据上训练的策略可零样本迁移至相似设置下的实体人形机器人,为人形机器人研究提供稳健可复现的基础。