AI Agents for Inventory Control: Human-LLM-OR Complementarity

Inventory control is a fundamental operations problem in which ordering decisions are traditionally guided by theoretically grounded operations research (OR) algorithms. However, such algorithms often rely on rigid modeling assumptions and can perform poorly when demand distributions shift or relevant contextual information is unavailable. Recent advances in large language models (LLMs) have generated interest in AI agents that can reason flexibly and incorporate rich contextual signals, but it remains unclear how best to incorporate LLM-based methods into traditional decision-making pipelines. We study how OR algorithms, LLMs, and humans can interact and complement each other in a multi-period inventory control setting. We construct InventoryBench, a benchmark of over 1,000 inventory instances spanning both synthetic and real-world demand data, designed to stress-test decision rules under demand shifts, seasonality, and uncertain lead times. Through this benchmark, we find that OR-augmented LLM methods outperform either method in isolation, suggesting that these methods are complementary rather than substitutes. We further investigate the role of humans through a controlled classroom experiment that embeds LLM recommendations into a human-in-the-loop decision pipeline. Contrary to prior findings that human-AI collaboration can degrade performance, we show that, on average, human-AI teams achieve higher profits than either humans or AI agents operating alone. Beyond this population-level finding, we formalize an individual-level complementarity effect and derive a distribution-free lower bound on the fraction of individuals who benefit from AI collaboration; empirically, we find this fraction to be substantial.

翻译：库存控制是运营管理中的基础性问题，传统上依赖理论完备的运筹学（OR）算法指导订购决策。然而，这类算法通常基于刚性的建模假设，当需求分布发生变化或相关情境信息缺失时，其性能可能显著下降。大语言模型（LLM）的最新进展催生了能够灵活推理并整合丰富情境信号的AI智能体，但如何将基于LLM的方法最佳地融入传统决策流程仍不明确。本研究探讨了在多周期库存控制场景中，运筹学算法、大语言模型与人类如何相互作用并实现互补。我们构建了InventoryBench基准测试集，涵盖合成与真实世界需求数据共1000余个库存实例，旨在对需求突变、季节性和不确定提前期等情境下的决策规则进行压力测试。通过该基准测试，我们发现：经运筹学增强的LLM方法优于任何单一方法，表明这些方法具有互补性而非替代关系。我们进一步通过受控课堂实验探究人类在决策流程中的作用，将LLM推荐嵌入人机协同决策链路。与先前关于人机协作可能降低性能的研究结论相反，本研究表明：平均而言，人机协作团队获得的利润高于人类或AI智能体单独决策的结果。除群体层面的发现外，我们形式化定义了个体层面的互补效应，并推导出受益于AI协作的个体比例的无分布下界；实证数据显示该比例具有显著规模。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

专知会员服务

25+阅读 · 4月19日

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

27+阅读 · 4月6日

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

20+阅读 · 3月30日

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

27+阅读 · 2月27日