GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

Yufei Jia,Heng Zhang,Ziheng Zhang,Junzhe Wu,Mingrui Yu,Zifan Wang,Dixuan Jiang,Zheng Li,Chenyu Cao,Zhuoyuan Yu,Xun Yang,Haizhou Ge,Yuchi Zhang,Jiayuan Zhang,Zhenbiao Huang,Tianle Liu,Shenyu Chen,Jiacheng Wang,Bin Xie,Xuran Yao,Xiwa Deng,Guangyu Wang,Jinzhi Zhang,Lei Hao,Zhixing Chen,Yuxiang Chen,Anqi Wang,Hongyun Tian,Yiyi Yan,Zhanxiang Cao,Yizhou Jiang,Hanyang Shao,Yue Li,Lu Shi,Bokui Chen,Wei Sui,Hanqing Cui,Yusen Qin,Ruqi Huang,Lei Han,Tiancai Wang,Guyue Zhou

from arxiv, Robotics: Science and Systems 2026

Embodied AI research is undergoing a shift toward vision-centric perceptual paradigms. While massively parallel simulators have catalyzed breakthroughs in proprioception-based locomotion, their potential remains largely untapped for vision-informed tasks due to the prohibitive computational overhead of large-scale photorealistic rendering. Furthermore, the creation of simulation-ready 3D assets heavily relies on labor-intensive manual modeling, while the significant sim-to-real physical gap hinders the transfer of contact-rich manipulation policies. To address these bottlenecks, we propose GS-Playground, a multi-modal simulation framework designed to accelerate end-to-end perceptual learning. We develop a novel high-performance parallel physics engine, specifically designed to integrate with a batch 3D Gaussian Splatting (3DGS) rendering pipeline to ensure high-fidelity synchronization. Our system achieves a breakthrough throughput of 10^4 FPS at 640x480 resolution, significantly lowering the barrier for large-scale visual RL. Additionally, we introduce an automated Real2Sim workflow that reconstructs photorealistic, physically consistent, and memory-efficient environments, streamlining the generation of complex simulation-ready scenes. Extensive experiments on locomotion, navigation, and manipulation demonstrate that GS-Playground effectively bridges the perceptual and physical gaps across diverse embodied tasks. Project homepage: https://gsplayground.github.io.

翻译：具身人工智能研究正经历向视觉中心感知范式的转变。尽管大规模并行仿真器已推动基于本体感知的运动控制取得突破，但由于大规模逼真渲染带来的高昂计算开销，其在视觉驱动任务中的潜力仍未充分开发。此外，创建可直接用于仿真的三维资产高度依赖人工建模，而显著的仿真到现实物理差距则阻碍了接触式操作策略的迁移。为应对上述瓶颈，我们提出GS-Playground——一种旨在加速端到端感知学习的多模态仿真框架。我们开发了一种新型高性能并行物理引擎，其专门设计用于与批处理三维高斯泼溅渲染流水线集成，以确保高保真同步。该系统在640×480分辨率下实现了每秒10^4帧的突破性吞吐量，显著降低了大规模视觉强化学习的应用门槛。同时，我们引入了一种自动化真实到仿真工作流，用于重建具有逼真视觉效果、物理一致性及内存高效性的环境，简化了复杂仿真就绪场景的生成。在运动控制、导航和操作任务上的广泛实验表明，GS-Playground有效弥合了不同具身任务中的感知与物理差距。项目主页：https://gsplayground.github.io。