As scientific workflows shift from deterministic executables to LLM-based agents, the development practices on offer, such as fine-tuning, reinforcement learning, and prompt-and-go, bury the scientist's judgment. We propose treating agent construction as a workflow stage and introduce AgentBuild, which builds a scientific agent from a contract the scientist authors. The contract is a version-controlled rubric, a difficulty-graded curriculum, and a curated external knowledge base. A rubric-driven judge gates a meta-optimizer coding agent that edits the agent within a declared boundary, so the build compiles the agent, not the scientist's judgment. We instantiate this for Rietveld refinement of X-ray diffraction data through GSAS-II behind MCP and A2A, where a blank-harness construction run progresses through a lithium lanthanum zirconium oxide (LLZO) signal-to-noise ladder, reaches the 4 hour scan as a frontier case, and exposes the workflow-scope limits that remain. The same rubric that rewards credible fits also scores trajectory scope, making the frontier a contract failure rather than a pattern-fitting failure. As base models evolve, re-running AgentBuild is a re-tune, not a rebuild, and the scientist's authored contract remains the durable asset.
翻译:随着科研工作流从确定性可执行程序转向基于大语言模型的智能体,当前可用的开发范式(如微调、强化学习及即时提示)往往埋没了科学家的专业判断。我们提出将智能体构建视为工作流的一个标准化阶段,并推出AgentBuild框架——该框架依据科学家撰写的合同自动构建科学智能体。该合同包含三要素:版本控制化的评估准则、难度递进的训练课程,以及经整理的外部知识库。基于准则驱动的裁判机制,元优化编码智能体可在限定边界内对目标智能体进行迭代编辑,实现智能体程序自动编译而非科学家主观判断的编译。我们将该方法应用于X射线衍射数据的Rietveld精修场景,通过MCP与A2A协议对接GSAS-II软件包:在空白框架构建实验中,系统沿着锂镧锆氧(LLZO)信噪比梯度逐步推进,最终在4小时扫描数据上达到前沿处理能力,同时暴露出工作流层面的边界限制。该评估准则既能评判拟合结果的可靠度,又能量化轨迹探索范围,使前沿瓶颈表现为合同约束失效而非模式拟合失效。当基础模型迭代升级时,重新运行AgentBuild仅需参数微调而非架构重建,科学家撰写的合同将始终作为可持续使用的核心资产。