Simulation is central to robot learning, yet the sim-to-real gap remains a major bottleneck. Existing approaches often tackle visual or dynamic gaps separately, overlooking how these individual mismatches accumulate and propagate throughout the robot's state evolution. In this paper, we introduce QuadVerse, an integrated framework that uses reconstructed scenes as a calibration substrate for aligning visual perception, physical interaction, and actuator dynamics. From captured RGB videos, we reconstruct geometry-constrained 3D Gaussian Splatting (3DGS) scenes that support batched photorealistic ego-view rendering and collision-ready semantic mesh extraction. The meshes further enable contact calibration by initializing spatially varying friction priors and refining them through trajectory-based posterior search. To address remaining actuator discrepancies, QuadVerse trains a residual dynamics compensator by replaying real-world trajectories on the contact-calibrated terrain, reducing the entanglement between terrain-induced contact errors and actuator non-idealities. Experiments show that QuadVerse improves reconstruction quality and locomotion tracking over relevant baselines. Leveraging this foundation, we demonstrate robust zero-shot visual-navigation policy deployment without task-specific real-world rollouts.
翻译:仿真技术在机器人学习中占据核心地位,然而仿真与现实的差距仍是主要瓶颈。现有方法往往分别处理视觉或动力学差距,忽视了这些个体偏差如何在机器人状态演化过程中累积与传播。本文提出集成框架QuadVerse,通过构建重建场景作为校准基板,实现视觉感知、物理交互与执行器动力学的系统性对齐。基于采集的RGB视频,我们重建了几何约束下的三维高斯泼溅场景,该场景支持批量光度逼真自我视角渲染及碰撞就绪语义网格提取。进一步地,这些网格通过初始化空间可变摩擦先验并利用轨迹后验搜索进行精细化校正,实现接触校准。为消除剩余的执行器偏差,QuadVerse在接触校准地形上回放真实世界轨迹,训练残差动力学补偿器,从而降低地形诱导的接触误差与执行器非理想性之间的耦合效应。实验表明,QuadVerse在重建质量与运动跟踪方面优于相关基线方法。基于此基础,我们实现了无需特定任务真实世界部署的鲁棒零样本视觉导航策略应用。