PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

Arhan Jain,Mingtong Zhang,Kanav Arora,William Chen,Marcel Torne,Muhammad Zubair Irshad,Sergey Zakharov,Yue Wang,Sergey Levine,Chelsea Finn,Wei-Chiu Ma,Dhruv Shah,Abhishek Gupta,Karl Pertsch

from arxiv, Website: https://polaris-evals.github.io/

A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.

翻译：机器人学习研究面临的一个重大挑战是我们能否准确测量和比较机器人策略的性能。由于现实世界部署的随机性、可重复性差以及耗时特性，机器人领域的基准测试历来具有挑战性。对于近期兴起的通用策略而言，这一挑战尤为严峻，因为此类策略需要在多种场景和任务中进行评估。仿真评估为现实世界评估提供了一种可扩展的补充手段，但现有仿真基准与真实世界之间存在的视觉和物理领域差距，使其难以成为策略改进的可靠信号。此外，构建逼真且多样化的仿真环境传统上需要大量的人力投入和专业经验。为弥合这一差距，我们提出了仿真环境重建与策略评估框架（PolaRiS），这是一个用于高保真机器人仿真评估的可扩展真实到仿真框架。PolaRiS利用神经重建方法，将真实场景的短视频扫描转换为交互式仿真环境。此外，我们开发了一种简单的仿真数据协同训练方案，以弥合剩余的真实到仿真差距，并支持在未见过的仿真环境中进行零样本评估。通过对仿真与真实世界进行大量配对评估，我们证明PolaRiS评估结果与真实世界通用策略性能的相关性，远强于现有仿真基准。其简洁性也支持快速创建多样化的仿真环境。因此，这项工作为下一代机器人基础模型的分布式与民主化评估迈出了重要一步。