Automatically formulating optimization models from natural language descriptions is a growing focus in operations research, yet current LLM-based approaches struggle with the composite constraints and appropriate modeling paradigms required by complex operational rules. To address this, we introduce the Canonical Intermediate Representation (CIR): a schema that LLMs explicitly generate between problem descriptions and optimization models. CIR encodes the semantics of operational rules through constraint archetypes and candidate modeling paradigms, thereby decoupling rule logic from its mathematical instantiation. Upon a newly generated CIR knowledge base, we develop the rule-to-constraint (R2C) framework, a multi-agent pipeline that parses problem texts, synthesizes CIR implementations by retrieving domain knowledge, and instantiates optimization models. To systematically evaluate rule-to-constraint reasoning, we test R2C on our newly constructed benchmark featuring rich operational rules, and benchmarks from prior work. Extensive experiments show that R2C achieves state-of-the-art accuracy on the proposed benchmark (47.2% Accuracy Rate). On established benchmarks from the literature, R2C delivers highly competitive results, approaching the performance of proprietary models (e.g., GPT-5). Moreover, with a reflection mechanism, R2C achieves further gains and sets new best-reported results on some benchmarks.
翻译:从自然语言描述自动构建优化模型是运筹学领域日益关注的焦点,然而当前基于大语言模型的方法在处理复杂运营规则所需的复合约束与合适建模范式方面存在困难。为解决这一问题,我们提出了规范中间表示:一种由大语言模型在问题描述与优化模型之间显式生成的模式。CIR通过约束原型和候选建模范式编码运营规则的语义,从而将规则逻辑与其数学实例化解耦。基于新构建的CIR知识库,我们开发了规则到约束框架——一个多智能体流程,能够解析问题文本、通过检索领域知识合成CIR实现,并实例化优化模型。为系统评估规则到约束的推理能力,我们在包含丰富运营规则的新建基准测试集以及前人工作的基准集上测试了R2C框架。大量实验表明,R2C在提出的基准测试上达到了最先进的准确率(47.2%准确率)。在文献既有基准测试中,R2C取得了极具竞争力的结果,接近专有模型的性能水平。此外,通过引入反思机制,R2C实现了进一步性能提升,并在部分基准测试中创造了新的最佳记录。