This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.
翻译:本文研究具有零样本泛化能力的机器人运动规划控制策略学习问题,即当所学策略部署于新环境时无需进行数据收集和策略自适应。我们提出一个联邦强化学习框架,支持多个学习器与中央服务器(即云平台)在不共享原始数据的情况下进行协同学习。每次迭代中,各学习器将其局部控制策略及对应估计归一化到达时间上传至云平台,后者计算所有学习器中的全局最优解,并将最优策略广播至各学习器。随后每个学习器在自身局部控制策略与云平台返回的最优策略间进行选择以用于下一轮迭代。所提框架基于推导得到的到达时间与安全性零样本泛化保证,同时提供了几乎必然收敛性、几乎一致性、帕累托改进与最优性间隙的理论保证。通过蒙特卡洛仿真对所提框架进行了评估。