Large Language Models (LLMs) have enabled dynamic reasoning in automated data analytics, yet recent multi-agent systems remain limited by rigid, single-path workflows that restrict strategic exploration and often lead to suboptimal outcomes. To overcome these limitations, we propose SPIO (Sequential Plan Integration and Optimization), a framework that replaces rigid workflows with adaptive, multi-path planning across four core modules: data preprocessing, feature engineering, model selection, and hyperparameter tuning. In each module, specialized agents generate diverse candidate strategies, which are cascaded and refined by an optimization agent. SPIO offers two operating modes: SPIO-S for selecting a single optimal pipeline, and SPIO-E for ensembling top-k pipelines to maximize robustness. Extensive evaluations on Kaggle and OpenML benchmarks show that SPIO consistently outperforms state-of-the-art baselines, achieving an average performance gain of 5.6%. By explicitly exploring and integrating multiple solution paths, SPIO delivers a more flexible, accurate, and reliable foundation for automated data science.
翻译:大型语言模型(LLM)已为自动化数据分析提供了动态推理能力,然而现有的多智能体系统仍受限于僵化的单一路径工作流,这限制了策略探索空间并常导致次优结果。为克服这些局限,我们提出了SPIO(顺序计划集成与优化)框架,该框架以自适应的多路径规划取代刚性工作流,覆盖四个核心模块:数据预处理、特征工程、模型选择与超参数调优。在每个模块中,专用智能体生成多样化的候选策略,并由优化智能体进行级联式精炼。SPIO提供两种运行模式:SPIO-S用于选择单一最优流水线,SPIO-E则集成top-k流水线以最大化鲁棒性。在Kaggle和OpenML基准测试上的广泛实验表明,SPIO持续优于现有先进基线方法,平均性能提升达5.6%。通过显式探索与集成多解路径,SPIO为自动化数据科学提供了更灵活、准确且可靠的基础框架。