This paper presents an integrated systematic study of the performance of large language models (LLMs), specifically ChatGPT, for automatically formulating and solving Stochastic Optimization (SO) problems from natural language descriptions. Focusing on three key categories, individual chance-constrained models, joint chance-constrained models, and two-stage stochastic mixed-integer linear programming models, we design several prompts that guide ChatGPT through structured tasks using chain-of-thought and agentic reasoning. We introduce a novel soft-scoring metric that evaluates the structural quality and partial correctness of generated models, addressing the limitations of canonical and execution-based accuracy metrics. Across a diverse set of SO problems, GPT-4-Turbo achieves better partial scores than GPT-3.5 variants except for individual chance-constrained problems. Structured prompts significantly outperform simple prompting, reducing extra-element generation and improving objective matching, although extra-element generation remains a nontrivial task. Our findings reveal that with well-engineered prompts and multi-agent collaboration, LLMs can facilitate SO formulations, paving the way for intelligent, language-driven modeling pipelines for SO in practice.
翻译:本文针对大语言模型(特别是ChatGPT)从自然语言描述自动构建并求解随机优化问题的性能进行了系统性综合研究。聚焦于个体机会约束模型、联合机会约束模型以及两阶段随机混合整数线性规划模型这三类关键问题,我们设计了多种提示策略,通过思维链与智能体推理引导ChatGPT完成结构化任务。我们提出了一种新颖的软评分指标,用于评估生成模型的结构质量与部分正确性,从而弥补了传统基于规范与执行精度指标的不足。在多样化的随机优化问题测试集上,除个体机会约束问题外,GPT-4-Turbo在部分得分上均优于GPT-3.5系列模型。结构化提示策略显著优于简单提示,有效减少了冗余元素生成并提升了目标函数匹配度,尽管冗余元素生成仍是亟待解决的挑战。研究结果表明,通过精心设计的提示策略与多智能体协作,大语言模型能够有效辅助随机优化模型的构建,为实践中实现智能化的语言驱动随机优化建模流程奠定了基础。