Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize execution priors to substitute costly runtime checks with instantaneous predictive reasoning, drawing inspiration from World Models. In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons. We demonstrate that LLMs exhibit significant predictive capabilities when primed with a Verified Data Analysis Report, achieving 61.5% accuracy and robust confidence calibration. Finally, we instantiate this framework in FOREAGENT, an agent that employs a Predict-then-Verify loop, achieving a 6x acceleration in convergence while surpassing execution-based baselines by +6%. Our code and dataset will be publicly available soon at https://github.com/zjunlp/predict-before-execute.
翻译:自主机器学习智能体已彻底改变了科学发现的过程,但其仍受限于生成-执行-反馈范式。先前方法存在严重的执行瓶颈问题,因为假设评估严格依赖于昂贵的物理执行过程。为突破这些物理限制,我们借鉴世界模型的思路,通过内化执行先验知识,用即时预测推理替代高成本的运行时验证。本研究形式化定义了以数据为中心的解决方案偏好任务,并构建了包含18,438组配对比较的完整语料库。实验表明,大语言模型在获得经过验证的数据分析报告提示后,展现出显著的预测能力,准确率达到61.5%且置信度校准稳健。最终,我们在FOREAGENT智能体中实例化了该框架,采用预测-验证循环机制,在收敛速度提升6倍的同时,以+6%的优势超越基于执行的基线方法。我们的代码与数据集即将发布于https://github.com/zjunlp/predict-before-execute。