Simulation is essential to reinforcement learning (RL) before implementation in the real world, especially for safety-critical applications like robot manipulation. Conventionally, RL agents are sensitive to the discrepancies between the simulation and the real world, known as the sim-to-real gap. The application of domain randomization, a technique used to fill this gap, is limited to the imposition of heuristic-randomized models. {We investigate the properties of intrinsic stochasticity of real-time simulation (RT-IS) of off-the-shelf simulation software and its potential to improve RL performance. This improvement includes a higher tolerance to noise and model imprecision and superiority to conventional domain randomization in terms of ease of use and automation. Firstly, we conduct analytical studies to measure the correlation of RT-IS with the utilization of computer hardware and validate its comparability with the natural stochasticity of a physical robot. Then, we exploit the RT-IS feature in the training of an RL agent. The simulation and physical experiment results verify the feasibility and applicability of RT-IS to robust agent training for robot manipulation tasks. The RT-IS-powered RL agent outperforms conventional agents on robots with modeling uncertainties. RT-IS requires less heuristic randomization, is not task-dependent, and achieves better generalizability than the conventional domain-randomization-powered agents. Our findings provide a new perspective on the sim-to-real problem in practical applications like robot manipulation tasks.
翻译:仿真对于强化学习(RL)在真实世界实施前至关重要,尤其适用于机器人操作等安全关键应用。传统上,RL智能体对仿真与真实世界之间的差异(即仿真到现实迁移差距)非常敏感。用于弥合这一差距的域随机化技术,其应用局限于强加启发式随机化模型。我们研究了现有仿真软件中实时仿真内在随机性(RT-IS)的特性及其提升RL性能的潜力。这种提升包括对噪声和模型不精确性的更高容忍度,以及在易用性和自动化方面优于传统域随机化。首先,我们开展分析性研究,测量RT-IS与计算机硬件利用率的关联性,并验证其与物理机器人自然随机性的可比性。随后,我们在RL智能体训练中利用RT-IS特性。仿真与物理实验结果验证了RT-IS在机器人操作任务中对鲁棒智能体训练的可行性与适用性。基于RT-IS的RL智能体在存在建模不确定性的机器人上表现优于传统智能体。RT-IS需要的启发式随机化更少,不依赖具体任务,且比传统基于域随机化的智能体具有更好的泛化能力。我们的发现为实际应用(如机器人操作任务)中的仿真到现实迁移问题提供了新视角。