iML: Executable, Problem-Grounded, and Broadly Exploratory Code-Driven AutoML

Automated Machine Learning (AutoML) has improved access to machine learning, yet existing techniques often remain limited in flexibility, transparency, and execution reliability. Code-driven AutoML offers a promising direction by synthesizing executable code for preprocessing, model training, and evaluation. However, current LLM-based approaches frequently generate code that is plausible in text yet brittle in execution, insufficiently grounded in the actual dataset, or restricted to narrow solution paths. In this paper, we introduce iML, a multi-agent code-driven AutoML framework designed around three requirements: executability, problem grounding, and broad exploration of valid solutions. iML first analyzes the task and profiles the data, then synthesizes a structured blueprint that guides modular code generation across multiple implementation tracks, including traditional ML,pretrained adaptation, and custom neural architectures. To improve reliability, iML enforces interface checking, dynamic execution, and iterative debugging during integration. We evaluate iML on MLE-BENCH and the newly introduced iML-BENCH, covering diverse Kaggle-style tasks. On MLE-BENCH, iML attains a 90% valid submission rate and a 45% medal rate, and an APS of 0.82, improving the average standardized performance score (APS) over the LLM-based baselines by 52%-273%. On iML-BENCH, it achieves the highest APS and demonstrates robust performance even when task descriptions are substantially stripped. These results establish iML as a reliable and competitive framework for code-driven AutoML.

翻译：摘要：自动机器学习（AutoML）已提升了机器学习的可及性，然而现有技术通常在灵活性、透明性和执行可靠性方面依然受限。代码驱动的AutoML通过合成为预处理、模型训练和评估而设计的可执行代码，提供了一条有前景的路径。然而，当前基于大语言模型的方法生成的代码往往在文本上看似合理，但在执行层面脆弱，缺乏对实际数据集的充分贴合，或局限于狭窄的解决方案路径。本文提出iML——一个围绕可执行性、问题导向及有效解的广泛探索三大需求构建的多智能体代码驱动AutoML框架。iML首先分析任务并剖析数据，随后合成结构化的蓝图，指导跨多个实现轨道（包括传统机器学习、预训练适配和定制神经网络架构）的模块化代码生成。为提高可靠性，iML在集成过程中强制执行接口检查、动态执行与迭代调试。我们在MLE-BENCH和新引入的iML-BENCH上评估iML，涵盖多样化的Kaggle式任务。在MLE-BENCH上，iML达到90%的有效提交率、45%的奖牌率及0.82的APS（平均标准化性能分数），较基于大语言模型的基线提升52%-273%。在iML-BENCH上，iML取得最高APS，且即使在任务描述大幅精简时仍展现出稳健性能。这些结果确立了iML作为代码驱动AutoML领域可靠且具竞争力的框架。