Decentralized learning algorithms are an essential tool for designing multi-agent systems, as they enable agents to autonomously learn from their experience and past interactions. In this work, we propose a theoretical and algorithmic framework for real-time identification of the learning dynamics that govern agent behavior using a short burst of a single system trajectory. Our method identifies agent dynamics through polynomial regression, where we compensate for limited data by incorporating side-information constraints that capture fundamental assumptions or expectations about agent behavior. These constraints are enforced computationally using sum-of-squares optimization, leading to a hierarchy of increasingly better approximations of the true agent dynamics. Extensive experiments demonstrated that our approach, using only 5 samples from a short run of a single trajectory, accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lyapunov times. These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.
翻译:去中心化学习算法是设计多智能体系统的重要工具,它使智能体能够自主地从自身经验和历史交互中学习。本文提出了一种理论与算法框架,用于在单条系统轨迹的短暂爆发式数据中实时识别支配智能体行为的学习动态。我们的方法通过多项式回归来辨识智能体动力学,并通过引入捕捉智能体行为基本假设或预期的辅助信息约束来补偿有限数据。这些约束通过平方和优化进行数值计算,形成对真实智能体动力学渐近逼近的层级结构。大量实验表明,仅需单条轨迹短时运行中的5个采样点,该方法能在多种基准测试中准确还原真实动力学,包括平衡选择评估以及最多10个李雅普诺夫时间跨度内的混沌系统预测。这些发现表明,我们的方法在支持战略性多智能体系统的有效策略制定与决策方面具有显著潜力。