Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, a multi-agent system designed to systematically leverage external knowledge. CoMind employs an iterative parallel exploration mechanism, developing multiple solutions simultaneously to balance exploratory breadth with implementation depth. On 75 past Kaggle competitions within our MLE-Live framework, CoMind achieves a 36% medal rate, establishing a new state of the art. Critically, when deployed in eight live, ongoing competitions, CoMind outperforms 92.6% of human competitors on average, placing in the top 5% on three official leaderboards and the top 1% on one.
翻译:大型语言模型(LLM)智能体在自动化机器学习(ML)工程方面展现出潜力。然而,现有智能体通常孤立地处理给定的研究问题,未能与更广泛的研究社区互动,而人类研究者常通过分享知识从中获得洞见并做出贡献。为弥合这一差距,我们提出了MLE-Live,一个实时评估框架,旨在评估智能体与模拟Kaggle研究社区进行交流并利用其集体知识的能力。基于此框架,我们提出了CoMind,一个旨在系统利用外部知识的多智能体系统。CoMind采用迭代并行探索机制,同时开发多种解决方案,以平衡探索广度与实现深度。在我们MLE-Live框架内的75场历史Kaggle竞赛中,CoMind获得了36%的奖牌率,创造了新的技术水平。关键的是,当部署于八场实时、正在进行的竞赛时,CoMind平均超越了92.6%的人类参赛者,在三项官方排行榜中位列前5%,并在其中一项中位列前1%。