A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-Network). In distributed reinforcement learning, Actor nodes acquire experiences by interacting with a given environment and a Learner node optimizes their DQN model. Since data transfer between Actor and Learner nodes increases depending on the number of Actor nodes and their experience size, communication overhead between them is one of major performance bottlenecks. In this paper, their communication is accelerated by DPDK-based network optimizations, and DPDK-based low-latency experience replay memory server is deployed between Actor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network. Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication latencies for prioritized experience sampling by 21.9% to 29.1%.
翻译:为加速基于DQN(深度Q网络)的分布式强化学习,采用互联多个计算节点的计算集群。在分布式强化学习中,Actor节点通过与给定环境交互获取经验,Learner节点优化其DQN模型。由于Actor节点与Learner节点间的数据传输量随Actor节点数量及其经验规模增加,通信开销成为主要性能瓶颈之一。本文通过基于DPDK的网络优化加速通信,并在采用40GbE(40吉比特以太网)网络互联的Actor与Learner节点之间部署基于DPDK的低延迟经验重放内存服务器。评估结果表明,作为网络优化技术,DPDK的内核旁路技术可将对共享内存服务器的网络访问延迟降低32.7%至58.9%。作为另一种网络优化技术,在Actor与Learner节点之间部署网络内经验重放内存服务器,可将经验重放内存访问延迟降低11.7%至28.1%,并将优先经验采样的通信延迟降低21.9%至29.1%。