Deep reinforcement learning (DRL) is one of the most powerful tools for synthesizing complex robotic behaviors. But training DRL models is incredibly compute and memory intensive, requiring large training datasets and replay buffers to achieve performant results. This poses a challenge for the next generation of field robots that will need to learn on the edge to adapt to their environment. In this paper, we begin to address this issue through observation space quantization. We evaluate our approach using four simulated robot locomotion tasks and two state-of-the-art DRL algorithms, the on-policy Proximal Policy Optimization (PPO) and off-policy Soft Actor-Critic (SAC) and find that observation space quantization reduces overall memory costs by as much as 4.2x without impacting learning performance.
翻译:深度强化学习(DRL)是合成复杂机器人行为最强大的工具之一,但训练DRL模型需要极高的计算和内存开销,必须依赖大规模训练数据集和重放缓冲区才能获得优异性能。这给下一代需要在边缘计算环境中学习以适应环境的现场机器人带来了挑战。本文通过观测空间量化初步解决该问题。我们使用四个模拟机器人运动任务和两种最先进的DRL算法(同策略的近端策略优化(PPO)与异策略的软演员-评论家(SAC))评估该方法,发现观测空间量化可在不影响学习性能的情况下将整体内存成本降低多达4.2倍。