Multi-Agent Reinforcement Learning (MARL) is an increasingly important research field that can model and control multiple large-scale autonomous systems. Despite its achievements, existing multi-agent learning methods typically involve expensive computations in terms of training time and power arising from large observation-action space and a huge number of training steps. Therefore, a key challenge is understanding and characterizing the computationally intensive functions in several popular classes of MARL algorithms during their training phases. Our preliminary experiments reveal new insights into the key modules of MARL algorithms that limit the adoption of MARL in real-world systems. We explore neighbor sampling strategy to improve cache locality and observe performance improvement ranging from 26.66% (3 agents) to 27.39% (12 agents) during the computationally intensive mini-batch sampling phase. Additionally, we demonstrate that improving the locality leads to an end-to-end training time reduction of 10.2% (for 12 agents) compared to existing multi-agent algorithms without significant degradation in the mean reward.
翻译:多智能体强化学习(MARL)是一个日益重要的研究领域,能够建模和控制多个大规模自主系统。尽管取得了诸多成就,但现有的大多数多智能体学习方法由于观测-动作空间巨大以及训练步数庞大,在训练时间和功耗方面需要进行昂贵的计算。因此,关键挑战在于理解和刻画多类主流MARL算法在其训练阶段中计算密集型的功能。我们的初步实验揭示了MARL算法中限制其在实际系统中应用的关键模块的新见解。我们探索了邻居采样策略以改善缓存局部性,并在计算密集型的小批量采样阶段观察到性能提升,范围从26.66%(3个智能体)到27.39%(12个智能体)。此外,我们证明,与现有的大多数多智能体算法相比,改善局部性使端到端训练时间减少了10.2%(针对12个智能体),同时平均奖励未见显著下降。