Game-theoretic inverse learning is the problem of inferring the players' objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates describes the follower's objective function. Instead of using passively observed trajectories like existing methods, the proposed method actively maximizes the differences in the follower's trajectories under different hypotheses to accelerate the leader's inference. We demonstrate the proposed method in a receding-horizon repeated trajectory game. Compared with uniformly random inputs, the leader inputs provided by the proposed method accelerate the convergence of the probability of different hypotheses conditioned on the follower's trajectory by orders of magnitude.
翻译:博弈论逆学习是通过智能体的行为推断其目标的问题。本文在领导者与跟随者构成的Stackelberg博弈中构建了一个逆学习问题,其中每个智能体的行为是动力系统的轨迹。我们提出了一种主动逆学习方法,使领导者能够从有限候选假设集合中推断出描述跟随者目标函数的正确假设。与现有方法使用被动观测轨迹不同,所提方法主动最大化不同假设下跟随者轨迹的差异,以加速领导者的推理过程。我们在滚动时域重复轨迹博弈中验证了所提方法。与均匀随机输入相比,所提方法提供的领导者输入使跟随者轨迹条件概率对不同假设的收敛速度提升数个数量级。