Embodied AI research is undergoing a shift toward vision-centric perceptual paradigms. While massively parallel simulators have catalyzed breakthroughs in proprioception-based locomotion, their potential remains largely untapped for vision-informed tasks due to the prohibitive computational overhead of large-scale photorealistic rendering. Furthermore, the creation of simulation-ready 3D assets heavily relies on labor-intensive manual modeling, while the significant sim-to-real physical gap hinders the transfer of contact-rich manipulation policies. To address these bottlenecks, we propose GS-Playground, a multi-modal simulation framework designed to accelerate end-to-end perceptual learning. We develop a novel high-performance parallel physics engine, specifically designed to integrate with a batch 3D Gaussian Splatting (3DGS) rendering pipeline to ensure high-fidelity synchronization. Our system achieves a breakthrough throughput of 10^4 FPS at 640x480 resolution, significantly lowering the barrier for large-scale visual RL. Additionally, we introduce an automated Real2Sim workflow that reconstructs photorealistic, physically consistent, and memory-efficient environments, streamlining the generation of complex simulation-ready scenes. Extensive experiments on locomotion, navigation, and manipulation demonstrate that GS-Playground effectively bridges the perceptual and physical gaps across diverse embodied tasks. Project homepage: https://gsplayground.github.io.
翻译:具身人工智能研究正经历向视觉中心感知范式的转变。尽管大规模并行仿真器已推动基于本体感知的运动控制取得突破,但由于大规模逼真渲染带来的高昂计算开销,其在视觉驱动任务中的潜力仍未充分开发。此外,创建可直接用于仿真的三维资产高度依赖人工建模,而显著的仿真到现实物理差距则阻碍了接触式操作策略的迁移。为应对上述瓶颈,我们提出GS-Playground——一种旨在加速端到端感知学习的多模态仿真框架。我们开发了一种新型高性能并行物理引擎,其专门设计用于与批处理三维高斯泼溅渲染流水线集成,以确保高保真同步。该系统在640×480分辨率下实现了每秒10^4帧的突破性吞吐量,显著降低了大规模视觉强化学习的应用门槛。同时,我们引入了一种自动化真实到仿真工作流,用于重建具有逼真视觉效果、物理一致性及内存高效性的环境,简化了复杂仿真就绪场景的生成。在运动控制、导航和操作任务上的广泛实验表明,GS-Playground有效弥合了不同具身任务中的感知与物理差距。项目主页:https://gsplayground.github.io。