We present MOSAIC, a multi-agent Large Language Model (LLM) framework for solving challenging scientific coding tasks. Unlike general-purpose coding, scientific workflows require algorithms that are rigorous, interconnected with deep domain knowledge, and incorporate domain-specific reasoning, as well as algorithm iteration without requiring I/O test cases. Many scientific problems also require a sequence of subproblems to be solved, leading to the final desired result. MOSAIC is designed as a training-free framework with specially designed agents to self-reflect, create the rationale, code, and debug within a student-teacher paradigm to address the challenges of scientific code generation. This design facilitates stepwise problem decomposition, targeted error correction, and, when combined with our Consolidated Context Window (CCW), mitigates LLM hallucinations when solving complex scientific tasks involving chained subproblems. We evaluate MOSAIC on scientific coding benchmarks and demonstrate that our specialized agentic framework outperforms existing approaches in terms of accuracy, robustness, and interpretability.
翻译:摘要:本文提出MOSAIC——一种用于解决挑战性科学编码任务的多智能体大语言模型框架。与通用编码不同,科学工作流要求算法具有严谨性、与深层领域知识相关联、融合领域特定推理能力,并且能在无需输入/输出测试用例的情况下进行算法迭代。许多科学问题还需要通过求解一系列子问题才能获得最终期望结果。MOSAIC被设计为一种免训练框架,通过专门设计的智能体在“学生-教师”范式下实现自我反思、逻辑构建、代码编写与调试,以应对科学代码生成的挑战。该设计促进了问题的逐步分解与定向错误纠正,当与我们的“整合上下文窗口”相结合时,能有效缓解大语言模型在解决涉及链式子问题的复杂科学任务时的幻觉问题。我们在科学编码基准上评估了MOSAIC,实验结果表明,该专用智能体框架在准确性、鲁棒性和可解释性方面均优于现有方法。