A Multi-Agent Framework for Code-Guided, Modular, and Verifiable Automated Machine Learning

Automated Machine Learning (AutoML) has revolutionized the development of data-driven solutions; however, traditional frameworks often function as "black boxes", lacking the flexibility and transparency required for complex, real-world engineering tasks. Recent Large Language Model (LLM)-based agents have shifted toward code-driven approaches. However, they frequently suffer from hallucinated logic and logic entanglement, where monolithic code generation leads to unrecoverable runtime failures. In this paper, we present iML, a novel multi-agent framework designed to shift AutoML from black-box prompting to a code-guided, modular, and verifiable architectural paradigm. iML introduces three main ideas: (1) Code-Guided Planning, which synthesizes a strategic blueprint grounded in autonomous empirical profiling to eliminate hallucination; (2) Code-Modular Implementation, which decouples preprocessing and modeling into specialized components governed by strict interface contracts; and (3) Code-Verifiable Integration, which enforces physical feasibility through dynamic contract verification and iterative self-correction. We evaluate iML across MLE-BENCH and the newly introduced iML-BENCH, comprising a diverse range of real-world Kaggle competitions. The experimental results show iML's superiority over state-of-the-art agents, achieving a valid submission rate of 85% and a competitive medal rate of 45% on MLE-BENCH, with an average standardized performance score (APS) of 0.77. On iML-BENCH, iML significantly outperforms the other approaches by 38%-163% in APS. Furthermore, iML maintains a robust 70% success rate even under stripped task descriptions, effectively filling information gaps through empirical profiling. These results highlight iML's potential to bridge the gap between stochastic generation and reliable engineering, marking a meaningful step toward truly AutoML.

翻译：自动化机器学习（AutoML）彻底改变了数据驱动解决方案的开发方式；然而，传统框架通常作为"黑箱"运行，缺乏复杂现实工程任务所需的灵活性与透明度。近期基于大语言模型（LLM）的智能体已转向代码驱动方法，但它们常常面临逻辑幻觉和逻辑纠缠问题，即整体式代码生成导致不可恢复的运行时故障。本文提出iML——一种创新的多智能体框架，旨在将AutoML从黑箱提示范式转向代码引导、模块化且可验证的架构范式。iML引入三个核心思想：（1）代码引导规划：基于自主经验分析生成战略蓝图以消除逻辑幻觉；（2）代码模块化实现：通过严格接口契约将预处理与建模解耦为专用组件；（3）代码可验证集成：通过动态契约验证与迭代自校正确保物理可行性。我们在MLE-BENCH与新构建的iML-BENCH上评估iML，这两个基准涵盖多样化的真实Kaggle竞赛场景。实验结果表明iML显著优于现有先进智能体：在MLE-BENCH上实现85%的有效提交率与45%的竞赛奖牌率，平均标准化性能得分（APS）达0.77；在iML-BENCH上，iML的APS较其他方法提升38%-163%。此外，即使在精简任务描述条件下，iML仍保持70%的稳健成功率，通过经验分析有效填补信息缺口。这些成果彰显了iML在弥合随机生成与可靠工程间鸿沟的潜力，标志着向真正AutoML迈出的实质性一步。