Federated learning (FL) has recently gained much attention due to its effectiveness in speeding up supervised learning tasks under communication and privacy constraints. However, whether similar speedups can be established for reinforcement learning remains much less understood theoretically. Towards this direction, we study a federated policy evaluation problem where agents communicate via a central aggregator to expedite the evaluation of a common policy. To capture typical communication constraints in FL, we consider finite capacity up-link channels that can drop packets based on a Bernoulli erasure model. Given this setting, we propose and analyze QFedTD - a quantized federated temporal difference learning algorithm with linear function approximation. Our main technical contribution is to provide a finite-sample analysis of QFedTD that (i) highlights the effect of quantization and erasures on the convergence rate; and (ii) establishes a linear speedup w.r.t. the number of agents under Markovian sampling. Notably, while different quantization mechanisms and packet drop models have been extensively studied in the federated learning, distributed optimization, and networked control systems literature, our work is the first to provide a non-asymptotic analysis of their effects in multi-agent and federated reinforcement learning.
翻译:联邦学习因其在通信与隐私约束下加速监督学习任务的有效性而近来备受关注。然而,类似加速是否能在强化学习中实现,在理论上仍缺乏深入理解。为此,本文研究了一个联邦策略评估问题,其中智能体通过中央聚合器通信以加速对公共策略的评估。为捕捉联邦学习中的典型通信约束,我们考虑基于伯努利擦除模型的有限容量上行信道,该信道可能丢弃数据包。在此设定下,我们提出并分析了QFedTD——一种采用线性函数近似的量化联邦时序差分学习算法。我们的主要技术贡献在于提供QFedTD的有限样本分析,该分析:(i)揭示了量化与擦除对收敛速率的影响;(ii)在马尔可夫采样下建立了关于智能体数量的线性加速。值得注意的是,尽管量化机制与数据包丢弃模型在联邦学习、分布式优化及网络控制系统文献中已被广泛研究,但本文首次对其在多智能体与联邦强化学习中的影响进行了非渐近分析。