Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location. Galactic is fast. In terms of simulation speed (rendering + physics), Galactic achieves over 421,000 steps-per-second (SPS) on an 8-GPU node, which is 54x faster than Habitat 2.0 (7699 SPS). More importantly, Galactic was designed to optimize the entire rendering + physics + RL interplay since any bottleneck in the interplay slows down training. In terms of simulation+RL speed (rendering + physics + inference + learning), Galactic achieves over 108,000 SPS, which 88x faster than Habitat 2.0 (1243 SPS). These massive speed-ups not only drastically cut the wall-clock training time of existing experiments, but also unlock an unprecedented scale of new experiments. First, Galactic can train a mobile pick skill to >80% accuracy in under 16 minutes, a 100x speedup compared to the over 24 hours it takes to train the same skill in Habitat 2.0. Second, we use Galactic to perform the largest-scale experiment to date for rearrangement using 5B steps of experience in 46 hours, which is equivalent to 20 years of robot experience. This scaling results in a single neural network composed of task-agnostic components achieving 85% success in GeometricGoal rearrangement, compared to 0% success reported in Habitat 2.0 for the same approach. The code is available at github.com/facebookresearch/galactic.

翻译：我们提出Galactic，一个用于室内环境机器人移动操作的大规模仿真与强化学习框架。具体而言，在居家环境中生成一个Fetch机器人（配备移动底盘、7自由度机械臂、RGBD摄像头、自运动感知与机载传感器），要求其通过导航至物体、抓取物体、导航至目标位置并放置物体来完成重排任务。Galactic具有高速特性：在仿真速度（渲染+物理）方面，Galactic在8-GPU节点上达到超过421,000步/秒（SPS），比Habitat 2.0（7,699 SPS）快54倍。更重要的是，Galactic专门优化了渲染、物理与强化学习三者间的交互流程，因为该交互中的任何瓶颈都会拖慢训练速度。在仿真与强化学习联合速度（渲染+物理+推理+学习）方面，Galactic达到超过108,000 SPS，比Habitat 2.0（1,243 SPS）快88倍。这种大幅加速不仅显著缩短了现有实验的墙钟训练时间，还解锁了前所未有的新实验规模。首先，Galactic可在16分钟内将移动抓取技能训练至超过80%的准确率，相比Habitat 2.0训练相同技能所需24小时以上实现了100倍加速。其次，我们利用Galactic在46小时内完成了迄今为止最大规模的基于50亿步经验的重排实验，相当于机器人20年的经验积累。这一规模下，由任务无关组件构成的单一神经网络在几何目标重排中实现了85%的成功率，而Habitat 2.0中采用相同方法报告的成功率为0%。代码已开源至github.com/facebookresearch/galactic。