We propose a scheme based on active learning to reconstruct private strategies executed by a population of interacting agents and predict an exact outcome of the underlying multi-agent interaction process, here identified as a stationary action profile. We envision a scenario where an external observer, endowed with a learning procedure, can make queries and observe the agents' reactions through private action-reaction mappings, whose collective fixed point corresponds to a stationary profile. By iteratively collecting sensible data and updating parametric estimates of the action-reaction mappings, we establish sufficient conditions to assess the asymptotic properties of the proposed active learning methodology so that, if convergence happens, it can only be towards a stationary action profile. This fact yields two main consequences: i) learning locally-exact surrogates of the action-reaction mappings allows the external observer to succeed in its prediction task, and ii) working with assumptions so general that a stationary profile is not even guaranteed to exist, the established sufficient conditions hence act also as certificates for the existence of such a desirable profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.
翻译:我们提出一种基于主动学习的方案,用于重构一组交互智能体所执行的私有策略,并预测底层多智能体交互过程的精确结果(此处定义为稳态行动轮廓)。我们设想一个场景:外部观测者借助学习过程,能够通过私有的行动-反应映射(其集体不动点对应稳态轮廓)进行查询并观测智能体的反应。通过迭代收集有效数据并更新行动-反应映射的参数估计,我们建立了充分条件来评估所提出的主动学习方法的渐近性质:若收敛发生,则只能收敛至稳态行动轮廓。这一事实产生两个主要结论:i) 学习行动-反应映射的局部精确替代模型使外部观测者能成功完成预测任务;ii) 在甚至不保证稳态轮廓存在的高度普适性假设下,所建立的充分条件同时可作为此类理想轮廓存在的认证。涉及典型竞争性多智能体控制与决策问题的大量数值仿真实验,验证了所提出基于学习方法的实际有效性。