Strategic multi-agent systems are fundamentally characterized by decentralization, uncertainty, and ambiguity. Agents operating under limited observations will often need to make decisions based on simplified internal models of the environment, reflecting bounded rationality in both computational capacity and environmental knowledge. The Empirical Evidence Equilibrium (EEE) framework explicitly accounts for these limitations by modeling each agent as forming a potentially misspecified belief derived from signals obtained through partial observations of the environment. The resulting equilibrium concept captures the system's steady state under bounded rationality and decentralization. In this work, we study games in which the environment dynamics are driven jointly by exogenous factors and agents' actions. We analyze agent behavior under Q-value iteration where each agent independently forms a belief model, computes Q-values, and derives a greedy strategy, yet the collective actions of all agents jointly shape the environment each agent faces at the next stage. We prove that despite this decentralization, an EEE emerges from the joint dynamics when the coupling between agents' actions and the environment is sufficiently weak. We further extend this result to softmax policies, establishing a contraction result under a sufficient coupling condition.
翻译:战略多智能体系统从根本上具有分散性、不确定性和模糊性的特征。在有限观测条件下运行的智能体往往需要基于简化的环境内部模型做出决策,这反映了其在计算能力和环境知识方面的有限理性。经验证据均衡框架通过将每个智能体建模为基于部分环境观测获得的信号形成可能错误设定的信念,明确考虑了这些限制。由此产生的均衡概念刻画了有限理性和分散化条件下系统的稳态。在本研究中,我们考察了环境动态由外生因素和智能体行为共同驱动的博弈。我们分析了基于Q值迭代的智能体行为,其中每个智能体独立形成信念模型、计算Q值并推导出贪婪策略,但所有智能体的集体行为共同塑造了每个智能体在下一阶段所面临的环境。我们证明,尽管存在这种分散性,当智能体行为与环境之间的耦合足够弱时,联合动态中会涌现出经验证据均衡。我们进一步将此结果扩展到softmax策略,在充分耦合条件下建立了收缩结果。