In many applications involving multi-agent system (MAS), it is imperative to test an experimental (Exp) autonomous agent in a high-fidelity simulator prior to its deployment to production, to avoid unexpected losses in the real-world. Such a simulator acts as the environmental background (BG) agent(s), called agent-based simulator (ABS), aiming to replicate the complex real MAS. However, developing realistic ABS remains challenging, mainly due to the sequential and dynamic nature of such systems. To fill this gap, we propose a metric to distinguish between real and synthetic multi-agent systems, which is evaluated through the live interaction between the Exp and BG agents to explicitly account for the systems' sequential nature. Specifically, we characterize the system/environment by studying the effect of a sequence of BG agents' responses to the environment state evolution and take such effects' differences as MAS distance metric; The effect estimation is cast as a causal inference problem since the environment evolution is confounded with the previous environment state. Importantly, we propose the Interactive Agent-Guided Simulation (INTAGS) framework to build a realistic ABS by optimizing over this novel metric. To adapt to any environment with interactive sequential decision making agents, INTAGS formulates the simulator as a stochastic policy in reinforcement learning. Moreover, INTAGS utilizes the policy gradient update to bypass differentiating the proposed metric such that it can support non-differentiable operations of multi-agent environments. Through extensive experiments, we demonstrate the effectiveness of INTAGS on an equity stock market simulation example. We show that using INTAGS to calibrate the simulator can generate more realistic market data compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network approach.
翻译:在涉及多智能体系统(MAS)的众多应用中,必须在实验性(Exp)自主智能体部署到生产环境之前,先在高保真模拟器中对其进行测试,以避免在现实世界中造成意外损失。这种模拟器作为环境背景(BG)智能体,即基于智能体的仿真(ABS),旨在复制复杂的真实MAS。然而,开发逼真的ABS仍具挑战性,主要由于此类系统的序列性和动态性。为填补这一空白,我们提出一种区分真实与合成多智能体系统的指标,该指标通过Exp智能体与BG智能体之间的实时交互进行评估,以明确考虑系统的序列性。具体而言,我们通过研究BG智能体对环境状态演化的响应序列效应来表征系统/环境,并以这些效应的差异作为MAS距离度量;由于环境演化与先前环境状态存在混杂,该效应估计被表述为因果推断问题。重要的是,我们提出交互式智能体引导仿真(INTAGS)框架,通过优化这一新型指标来构建逼真的ABS。为适应任何包含交互式序列决策智能体的环境,INTAGS将模拟器表述为强化学习中的随机策略。此外,INTAGS利用策略梯度更新绕过所提指标的微分计算,从而支持多智能体环境的不可微操作。通过大量实验,我们在股票市场仿真示例中验证了INTAGS的有效性。结果表明,与最先进的条件Wasserstein生成对抗网络方法相比,使用INTAGS校准模拟器可生成更逼真的市场数据。