We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.
翻译:本文推导了一种极小极大分布式鲁棒逆强化学习算法,用于重构多智能体感知系统的效用函数。具体而言,我们构建了在Wasserstein模糊集(以含噪信号观测为中心)上最小化最坏情况预测误差的效用估计器。我们证明了该鲁棒估计问题与半无限优化重构形式之间的等价性,并提出了一种求解的一致性算法。通过数值研究,我们展示了该鲁棒逆强化学习方案在从观测跟踪信号中重构认知雷达网络效用函数方面的有效性。