A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

Deep Reinforcement Learning (RL) has been explored and verified to be effective in solving decision-making tasks in various domains, such as robotics, transportation, recommender systems, etc. It learns from the interaction with environments and updates the policy using the collected experience. However, due to the limited real-world data and unbearable consequences of taking detrimental actions, the learning of RL policy is mainly restricted within the simulators. This practice guarantees safety in learning but introduces an inevitable sim-to-real gap in terms of deployment, thus causing degraded performance and risks in execution. There are attempts to solve the sim-to-real problems from different domains with various techniques, especially in the era with emerging techniques such as large foundations or language models that have cast light on the sim-to-real. This survey paper, to the best of our knowledge, is the first taxonomy that formally frames the sim-to-real techniques from key elements of the Markov Decision Process (State, Action, Transition, and Reward). Based on the framework, we cover comprehensive literature from the classic to the most advanced methods including the sim-to-real techniques empowered by foundation models, and we also discuss the specialties that are worth attention in different domains of sim-to-real problems. Then we summarize the formal evaluation process of sim-to-real performance with accessible code or benchmarks. The challenges and opportunities are also presented to encourage future exploration of this direction. We are actively maintaining a repository to include the most up-to-date sim-to-real research work to help domain researchers.

翻译：深度强化学习（RL）已在机器人学、交通、推荐系统等多个领域的决策任务中被探索并验证为有效。它通过与环境的交互进行学习，并利用收集的经验更新策略。然而，由于现实世界数据有限以及采取有害行为可能带来难以承受的后果，强化学习策略的学习主要被限制在仿真器中进行。这种做法保证了学习过程的安全性，但在部署时引入了不可避免的仿真到现实差距，从而导致性能下降和执行风险。已有研究尝试从不同领域、运用多种技术来解决仿真到现实问题，尤其是在大型基础模型或语言模型等新兴技术为仿真到现实带来曙光的新时代。据我们所知，本综述论文首次提出了一个正式的归类法，从马尔可夫决策过程的关键要素（状态、动作、转移和奖励）出发来构建仿真到现实技术框架。基于此框架，我们涵盖了从经典到最先进方法的全面文献，包括由基础模型赋能的仿真到现实技术，并讨论了在不同领域的仿真到现实问题中值得关注的特点。随后，我们总结了仿真到现实性能的正式评估流程，包括可获取的代码或基准测试。文中也提出了挑战与机遇，以鼓励未来对此方向的探索。我们正在积极维护一个资源库，以纳入最新的仿真到现实研究工作，助力领域研究者。