Real-time learning is crucial for robotic agents adapting to ever-changing, non-stationary environments. A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly. Given such a setup, it is unclear to what extent the performance of a learning system can be affected by resource limitations and how to efficiently use the wirelessly connected powerful computer to compensate for any performance loss. In this paper, we implement a real-time learning system called the Remote-Local Distributed (ReLoD) system to distribute computations of two deep reinforcement learning (RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), between a local and a remote computer. The performance of the system is evaluated on two vision-based control tasks developed using a robotic arm and a mobile robot. Our results show that SAC's performance degrades heavily on a resource-limited local computer. Strikingly, when all computations of the learning system are deployed on a remote workstation, SAC fails to compensate for the performance loss, indicating that, without careful consideration, using a powerful remote computer may not result in performance improvement. However, a carefully chosen distribution of computations of SAC consistently and substantially improves its performance on both tasks. On the other hand, the performance of PPO remains largely unaffected by the distribution of computations. In addition, when all computations happen solely on a powerful tethered computer, the performance of our system remains on par with an existing system that is well-tuned for using a single machine. ReLoD is the only publicly available system for real-time RL that applies to multiple robots for vision-based tasks.
翻译:实时学习对于机器人在多变、非平稳环境中适应至关重要。机器人的常见配置是同时使用两台计算机:一台与机器人相连的资源受限本地计算机,以及一台通过无线连接的强大远程计算机。在此配置下,尚不明确资源限制对学习系统性能的具体影响程度,以及如何有效利用无线连接的强大计算机弥补性能损失。本文实现了一个名为远程-本地分布式(ReLoD)系统的实时学习系统,在两个深度强化学习算法——柔性演员-评论家(SAC)和近端策略优化(PPO)之间,将本地与远程计算机的计算进行分布式处理。该系统性能通过基于机械臂和移动机器人开发的两个视觉控制任务进行评估。结果表明,SAC在资源受限的本地计算机上性能显著下降。值得注意的是,当学习系统所有计算部署在远程工作站时,SAC未能弥补性能损失,这表明若未经周密考量,使用强大的远程计算机未必能提升性能。然而,通过精心选择SAC的计算分布,其在两项任务中的性能均得到持续且显著的提升。相比之下,PPO的性能基本不受计算分布影响。此外,当所有计算仅在一台强大的有线连接计算机上执行时,系统性能与经过良好调优的现有单机系统持平。ReLoD是唯一公开可用的、适用于多种机器人完成视觉任务的实时强化学习系统。