Trajectory guidance requires a leader robotic agent to assist a follower robotic agent to cooperatively reach the target destination. However, planning cooperation becomes difficult when the leader serves a family of different followers and has incomplete information about the followers. There is a need for learning and fast adaptation of different cooperation plans. We develop a Stackelberg meta-learning approach to address this challenge. We first formulate the guided trajectory planning problem as a dynamic Stackelberg game to capture the leader-follower interactions. Then, we leverage meta-learning to develop cooperative strategies for different followers. The leader learns a meta-best-response model from a prescribed set of followers. When a specific follower initiates a guidance query, the leader quickly adapts to the follower-specific model with a small amount of learning data and uses it to perform trajectory guidance. We use simulations to elaborate that our method provides a better generalization and adaptation performance on learning followers' behavior than other learning approaches. The value and the effectiveness of guidance are also demonstrated by the comparison with zero guidance scenarios.
翻译:轨迹引导需要领导者机器人代理协助跟随者机器人代理合作到达目标目的地。然而,当领导者服务于一组不同的跟随者且对跟随者的信息不完全了解时,规划合作变得困难。因此,需要学习并快速适应不同的合作方案。我们开发了一种Stackelberg元学习方法来解决这一挑战。首先,我们将引导轨迹规划问题建模为动态Stackelberg博弈,以捕捉领导者与跟随者之间的交互。然后,我们利用元学习为不同的跟随者制定合作策略。领导者从一组预定义的跟随者中学习一个元最优响应模型。当特定跟随者发起引导查询时,领导者利用少量学习数据快速适应跟随者特定的模型,并使用该模型执行轨迹引导。通过仿真,我们详细说明了我们的方法在学习跟随者行为方面比其他学习方法具有更好的泛化和自适应性能。通过与零引导场景的比较,也证明了引导的价值和有效性。