Perceptive deep reinforcement learning (DRL) has lead to many recent breakthroughs for complex AI systems leveraging image-based input data. Applications of these results range from super-human level video game agents to dexterous, physically intelligent robots. However, training these perceptive DRL-enabled systems remains incredibly compute and memory intensive, often requiring huge training datasets and large experience replay buffers. This poses a challenge for the next generation of field robots that will need to be able to learn on the edge in order to adapt to their environments. In this paper, we begin to address this issue through differentially encoded observation spaces. By reinterpreting stored image-based observations as a video, we leverage lossless differential video encoding schemes to compress the replay buffer without impacting training performance. We evaluate our approach with three state-of-the-art DRL algorithms and find that differential image encoding reduces the memory footprint by as much as 14.2x and 16.7x across tasks from the Atari 2600 benchmark and the DeepMind Control Suite (DMC) respectively. These savings also enable large-scale perceptive DRL that previously required paging between flash and RAM to be run entirely in RAM, improving the latency of DMC tasks by as much as 32%.
翻译:感知深度强化学习(DRL)凭借基于图像输入的数据,在复杂AI系统领域取得了诸多突破性进展。这些成果的应用范围涵盖从超人类水平的视频游戏智能体到灵巧的物理智能机器人。然而,训练此类具备感知能力的DRL系统仍然需要极高的计算和内存资源,通常需要庞大的训练数据集和大规模的经验回放缓冲区。这对下一代需要在边缘端学习以适应环境的现场机器人提出了挑战。本文通过差分编码观测空间初步解决该问题。通过将存储的图像观测数据重新解释为视频序列,我们利用无损差分视频编码方案压缩回放缓冲区,同时不影响训练性能。我们采用三种最先进的DRL算法评估该方法,发现差分图像编码在Atari 2600基准测试和DeepMind控制套件(DMC)任务中,分别将内存占用降低至14.2倍和16.7倍。这些节省还使得此前需在闪存与RAM之间分页的大规模感知DRL完全在RAM中运行,将DMC任务的延迟最多降低32%。