基于强化学习的移动机器人仿真到现实迁移：从NVIDIA Isaac Sim到Gazebo及真实ROS 2机器人 (Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots)

Unprecedented agility and dexterous manipulation have been demonstrated with controllers based on deep reinforcement learning (RL), with a significant impact on legged and humanoid robots. Modern tooling and simulation platforms, such as NVIDIA Isaac Sim, have been enabling such advances. This article focuses on demonstrating the applications of Isaac in local planning and obstacle avoidance as one of the most fundamental ways in which a mobile robot interacts with its environments. Although there is extensive research on proprioception-based RL policies, the article highlights less standardized and reproducible approaches to exteroception. At the same time, the article aims to provide a base framework for end-to-end local navigation policies and how a custom robot can be trained in such simulation environment. We benchmark end-to-end policies with the state-of-the-art Nav2, navigation stack in Robot Operating System (ROS). We also cover the sim-to-real transfer process by demonstrating zero-shot transferability of policies trained in the Isaac simulator to real-world robots. This is further evidenced by the tests with different simulated robots, which show the generalization of the learned policy. Finally, the benchmarks demonstrate comparable performance to Nav2, opening the door to quick deployment of state-of-the-art end-to-end local planners for custom robot platforms, but importantly furthering the possibilities by expanding the state and action spaces or task definitions for more complex missions. Overall, with this article we introduce the most important steps, and aspects to consider, in deploying RL policies for local path planning and obstacle avoidance with Isaac Sim training, Gazebo testing, and ROS 2 for real-time inference in real robots. The code is available at https://github.com/sahars93/RL-Navigation.

翻译：基于深度强化学习的控制器已在腿式机器人和人形机器人上展现出前所未有的敏捷性与灵巧操作能力，产生了显著影响。现代工具链与仿真平台（如NVIDIA Isaac Sim）正持续推动此类进展。本文重点展示Isaac在局部路径规划与避障中的应用——这是移动机器人与环境交互最基本的方式之一。尽管基于本体感知的强化学习策略已有广泛研究，但本文着重探讨外感知领域中尚未充分标准化且可复现性不足的方法。同时，本文旨在为端到端局部导航策略提供基础框架，并阐述如何在仿真环境中训练定制机器人。我们通过机器人操作系统中的先进导航堆栈Nav2对端到端策略进行基准测试。此外，我们展示了在Isaac仿真器中训练的策略向真实机器人的零样本迁移过程，并通过不同仿真机器人的测试进一步验证了所学策略的泛化能力。基准测试结果表明，其性能与Nav2相当，这为快速部署定制机器人平台的先进端到端局部规划器开辟了道路，更重要的是通过扩展状态空间、动作空间或任务定义，为执行更复杂任务提供了可能性。总体而言，本文系统介绍了基于Isaac Sim训练、Gazebo测试、ROS 2实时推理的强化学习策略在局部路径规划与避障任务中部署的核心步骤与关键考量。相关代码已开源：https://github.com/sahars93/RL-Navigation。