Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.
翻译:最近提出的分布式零阶优化算法在分布式强化学习中展现了其实用性。然而,在梯度估计过程中,几乎所有此类算法都要求随机采样与全局变量维度相同,且/或需要评估全局代价函数,这可能导致大规模网络中的高估计方差。本文提出一种新颖的分布式零阶算法,通过利用优化目标中固有的网络结构,使每个智能体能够独立地通过局部代价评估来估计其局部梯度,无需使用任何一致性协议。该算法采用异步更新方案,并基于块坐标下降法设计,适用于可能具有非凸可行域的随机非凸优化问题。随后,该算法被用作分布式线性二次型调节器设计的无模型分布式强化学习算法,其中设计了一个学习图来描述分布式学习中智能体之间所需的交互关系。我们通过实验验证了所提算法在收敛速度和方差方面的性能,并与集中式零阶算法进行了基准比较。