This paper introduces two learning schemes for distributed agents in Reinforcement Learning (RL) environments, namely Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. The R/L weighted methods replace standard practices for training multiple agents, such as summing or averaging the gradients. The core of our methods is to scale the gradient of each actor based on how high the reward (for R-Weighted) or the loss (for L-Weighted) is compared to the other actors. During training, each agent operates in differently initialized versions of the same environment, which gives different gradients from different actors. In essence, the R-Weights and L-Weights of each agent inform the other agents of its potential, which again reports which environment should be prioritized for learning. This approach of distributed learning is possible because environments that yield higher rewards, or low losses, have more critical information than environments that yield lower rewards or higher losses. We empirically demonstrate that the R-Weighted methods work superior to the state-of-the-art in multiple RL environments.
翻译:本文提出了两种适用于强化学习(RL)环境中分布式智能体的学习方案,即奖励加权(R-Weighted)和损失加权(L-Weighted)梯度合并方法。R/L加权方法取代了训练多个智能体时的标准做法(如梯度的求和或平均)。我们方法的核心在于,根据每个智能体的奖励(针对R-Weighted)或损失(针对L-Weighted)相较于其他智能体的高低程度,来缩放其梯度。训练过程中,每个智能体在同一环境的不同初始化版本中运行,从而从不同智能体获得不同的梯度。本质上,每个智能体的R权重和L权重向其他智能体传递其潜在价值,进而指示应优先学习哪些环境。这种分布式学习方法之所以可行,是因为产生更高奖励或更低损失的环境,相较于产生更低奖励或更高损失的环境,包含更关键的信息。我们通过实验证明,R-Weighted方法在多个RL环境中优于现有最优方法。