Inventory management remains a challenge for many small and medium-sized businesses that lack the expertise to deploy advanced optimization methods. This paper investigates whether Large Language Models (LLMs) can help bridge this gap. We show that employing LLMs as direct, end-to-end solvers incurs a significant "hallucination tax": a performance gap arising from the model's inability to perform grounded stochastic reasoning. To address this, we propose a hybrid agentic framework that strictly decouples semantic reasoning from mathematical calculation. In this architecture, the LLM functions as an intelligent interface, eliciting parameters from natural language and interpreting results while automatically calling rigorous algorithms to build the optimization engine. To evaluate this interactive system against the ambiguity and inconsistency of real-world managerial dialogue, we introduce the Human Imitator, a fine-tuned "digital twin" of a boundedly rational manager that enables scalable, reproducible stress-testing. Our empirical analysis reveals that the hybrid agentic framework reduces total inventory costs by 32.1% relative to an interactive baseline using GPT-4o as an end-to-end solver. Moreover, we find that providing perfect ground-truth information alone is insufficient to improve GPT-4o's performance, confirming that the bottleneck is fundamentally computational rather than informational. Our results position LLMs not as replacements for operations research, but as natural-language interfaces that make rigorous, solver-based policies accessible to non-experts.
翻译:对于许多缺乏专业知识来部署高级优化方法的中小型企业而言,库存管理仍然是一项挑战。本文研究了大型语言模型(LLMs)是否能帮助弥合这一差距。我们发现,将LLMs用作直接的端到端求解器会产生显著的“幻觉税”:即由于模型无法进行基于现实的随机推理而导致的性能差距。为解决这一问题,我们提出了一种混合智能体框架,该框架严格地将语义推理与数学计算解耦。在此架构中,LLM充当智能接口,从自然语言中提取参数并解释结果,同时自动调用严谨的算法来构建优化引擎。为了在现实世界管理对话的模糊性和不一致性背景下评估这一交互式系统,我们引入了“人类模拟器”——一个经过微调的、有限理性管理者的“数字孪生”,它支持可扩展、可重复的压力测试。我们的实证分析表明,相对于使用GPT-4o作为端到端求解器的交互式基线,混合智能体框架将总库存成本降低了32.1%。此外,我们发现仅提供完美的真实信息不足以提升GPT-4o的性能,这证实了瓶颈本质上是计算性的而非信息性的。我们的研究结果表明,LLMs不应被视为运筹学的替代品,而应被视为一种自然语言接口,使非专业人士能够使用基于严谨求解器的策略。