Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.

翻译：本文以MIT啤酒游戏为场景，研究多级供应链中的自主生成式AI代理。我们识别出四种影响性能的推理时杠杆：模型选择、策略与防护机制、集中式数据共享以及提示工程。模型能力是主导因素：开箱即用的推理模型超越人类水平性能，而优化后的推理模型相较人类团队可降低高达67%的成本。然而，强劲的平均性能掩盖了显著的可靠性风险。我们提出"代理牛鞭效应"概念：自主多级系统中决策不稳定性的逐次放大效应。其核心组成部分为"决策牛鞭效应"，即订单波动中由随机代理决策（而非客户需求变化）产生的部分。研究表明，即使在固定需求路径下，决策不稳定性既能在同一时间点跨设施间放大，也能在相同设施内随时间推移累积。试错法中的自然手段——重复采样未能有效降低此类不稳定性，表明可靠性需要改变底层决策策略而非简单平均模型输出。针对该局限，我们提出基于群体相对策略优化(GRPO)的强化学习后训练框架，通过系统级供应链奖励训练共享基础大语言模型(LLM)。后训练显著降低尾部事件、抑制代理牛鞭效应，并提升自主供应链代理的可靠性。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

代码即代理基础设施：迈向可执行、可验证、有状态的AI代理系统

专知会员服务

18+阅读 · 5月20日

《军用自主人工智能系统的治理与安全》

专知会员服务

18+阅读 · 4月21日

区块链自主智能体：标准规范、执行模型与信任边界研究

专知会员服务

18+阅读 · 1月9日

自主智能：多模态人工智能代理重塑技术未来

专知会员服务

26+阅读 · 2025年11月23日