CALM Before the STORM：解锁优化建模中的原生推理能力 (CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling)

Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional \textit{non-reflective} datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose \textbf{CALM} (\textit{Corrective Adaptation with Lightweight Modification}), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6\% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop \textbf{STORM} (\textit{Smart Thinking Optimization Reasoning Model}), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9\% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.

翻译：大型推理模型（LRMs）在复杂的多步推理任务中展现出了强大的能力，为优化建模的自动化开辟了新的机遇。然而，现有的领域适应方法最初是为早期的指令调优模型设计的，往往无法充分利用现代LRMs的高级推理模式——具体而言，我们证明直接在传统的\textit{非反思性}数据集上进行微调带来的增益有限。为了充分利用LRMs固有的推理能力，我们提出了\textbf{CALM}（\textit{轻量修正的校正适应}），这是一个在优化建模任务中，于其原生推理模式内逐步精炼LRMs的框架。在CALM中，专家干预者识别推理缺陷并提供简洁的校正提示，LRM则据此生成改进的推理轨迹。这些干预仅修改少于2.6\%的生成标记，但能通过监督式微调生成高质量数据用于软适应。随后，适应后的模型通过强化学习得到进一步改进。基于CALM，我们开发了\textbf{STORM}（\textit{智能思维优化推理模型}），这是一个拥有40亿参数的LRM，在五个流行的优化建模基准测试中取得了平均68.9\%的最新最优准确率，其性能与一个6710亿参数的LRM相当。这些结果表明，基于提示的动态数据合成不仅能保留，还能增强现代LRMs的原生推理模式，为在具有挑战性的优化建模任务上实现专家级性能提供了一条更有效且可扩展的路径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日