Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.
翻译:大型语言模型(LLM)通过实现动态推理与自适应能力,彻底改变了自动化数据分析和机器学习领域。尽管近期研究通过多智能体系统推进了多阶段流程的发展,但这些方法通常依赖于僵化的单路径工作流,限制了对多样化策略的探索与整合,往往导致次优预测结果。为应对这些挑战,我们提出SPIO(顺序计划集成与优化)——一种创新框架,该框架利用LLM驱动的决策机制,在四个核心模块(数据预处理、特征工程、建模及超参数调优)中协调多智能体规划。在每个模块中,专用规划智能体独立生成候选策略,这些策略将级联传递至后续阶段,从而促进全面探索。计划优化智能体通过提出若干优化方案来精炼这些策略。我们进一步引入两种变体:SPIO-S(根据LLM判定选择单一最优解路径)与SPIO-E(选取前k个候选计划进行集成以最大化预测性能)。在Kaggle和OpenML数据集上的大量实验表明,SPIO显著优于现有最先进方法,为自动化数据科学任务提供了鲁棒且可扩展的解决方案。