Modeling coordination among generative agents in complex multi-round decision-making presents a core challenge for AI and operations management. Although behavioral experiments have revealed cognitive biases behind supply chain inefficiencies, traditional methods face scalability and control limitations. We introduce a scalable experimental paradigm using Large Language Models (LLMs) to simulate multi-stage supply chain dynamics. Grounded in a Hierarchical Reasoning Framework, this study specifically analyzes the impact of cognitive heterogeneity on agent interactions. Unlike prior homogeneous settings, we employ DeepSeek and GPT agents to systematically vary reasoning sophistication across supply chain tiers. Through rigorously replicated and statistically validated simulations, we investigate how this cognitive diversity influences collective outcomes. Results indicate that agents exhibit myopic and self-interested behaviors that exacerbate systemic inefficiencies. However, we demonstrate that information sharing effectively mitigates these adverse effects. Our findings extend traditional behavioral methods and offer new insights into the dynamics of AI-enabled organizations. This work underscores both the potential and limitations of LLM-based agents as proxies for human decision-making in complex operational environments.
翻译:在对复杂多轮决策场景中生成式智能体间的协调进行建模,是人工智能与运营管理领域的核心挑战。尽管行为实验已揭示供应链效率低下背后的认知偏差,但传统方法面临可扩展性与控制能力的局限。我们提出一种可扩展的实验范式,利用大型语言模型模拟多阶段供应链动态。本研究基于层级推理框架,专门分析认知异质性对智能体交互的影响。与以往同质化设定不同,我们通过DeepSeek与GPT智能体,系统性改变各供应链层级的推理复杂度。通过严格复制与统计验证的模拟实验,我们探究认知多样性如何影响集体决策结果。研究表明,智能体表现出的短视与利己行为会加剧系统性效率损失;然而,信息共享能有效缓解此类负面效应。本研究拓展了传统行为研究方法论,为理解人工智能驱动组织的动态特性提供了新视角。该工作既彰显了大语言模型智能体作为人类决策代理在复杂运营环境中的应用潜力,也揭示了其局限性。