Existing machine learning engineering (MLE) agents struggle to iteratively optimize their implemented algorithms for effectiveness. To address this, we introduce MLE-Ideator, a dual-agent framework that separates ideation from implementation. In our system, an implementation agent can request strategic help from a dedicated Ideator. We show this approach is effective in two ways. First, in a training-free setup, our framework significantly outperforms implementation-only agent baselines on MLE-Bench. Second, we demonstrate that the Ideator can be trained with reinforcement learning (RL) to generate more effective ideas. With only 1K training samples from 10 MLE tasks, our RL-trained Qwen3-8B Ideator achieves an 11.5% relative improvement compared to its untrained counterpart and surpasses Claude Sonnet 3.5. These results highlights a promising path toward training strategic AI systems for scientific discovery.
翻译:现有机器学习工程(MLE)智能体在迭代优化其实现算法的有效性方面存在困难。为解决此问题,我们提出了MLE-Ideator,一种将构思与实现分离的双智能体框架。在我们的系统中,实现智能体可以向一个专门的构思器请求策略性帮助。我们从两个方面证明了该方法的有效性。首先,在无需训练的设置下,我们的框架在MLE-Bench基准上显著优于仅包含实现智能体的基线方法。其次,我们证明了构思器可以通过强化学习(RL)进行训练,以生成更有效的构思。仅使用来自10个MLE任务的1K训练样本,我们经过RL训练的Qwen3-8B构思器相较于未经训练的版本实现了11.5%的相对性能提升,并且超越了Claude Sonnet 3.5。这些结果突显了一条训练用于科学发现的策略性人工智能系统的可行路径。