Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios.Existing approaches predominantly rely on model-centric strategies, such as architectural modifications and data scaling, which are costly and offer limited interpretability. Alternative methods leveraging external tools or prompting techniques (e.g., chain-of-thought) are often fragmented and lack a unified framework. In this paper, we propose MAS-Algorithm, a systematic multi-agent workflow for algorithmic problem solving inspired by the practices of competitive programmers and algorithm engineers. Our framework decomposes the end-to-end solving process into modular stages, enabling structured reasoning, tool integration, and flexible coordination among agents. The design emphasizes both rigor and extensibility, allowing it to generalize across diverse problem types.Experimental results on a self-constructed benchmark demonstrate consistent improvements across multiple Qwen series models, achieving an average gain of 6.48% in acceptance rate. In contrast, parameter-efficient fine-tuning on the same data yields only a marginal improvement of 0.89%. We further observe a 4.72% gain on LiveCodeBench-Pro, along with consistent improvements across additional accuracy and efficiency metrics.Beyond performance gains, we conduct comprehensive analyses to better understand the reasoning process within the workflow, including error patterns and cross-scenario behaviors. We further perform customized replacement and ablation studies to explore the upper bound of the framework, showing that individual agents can contribute improvements of up to 27.7%. These results highlight the strong potential of MAS-Algorithm for advancing AI-driven algorithmic reasoning.
翻译:算法问题求解是评估AI编码系统结构化推理能力的严格测试平台,因为它直接反映了模型在复杂场景下执行结构化推理的能力。现有方法主要依赖模型中心策略(如架构修改和数据缩放),成本高昂且可解释性有限。其他利用外部工具或提示技术(例如思维链)的方法往往零散且缺乏统一框架。本文提出MAS-Algorithm,一种受竞争程序员和算法工程师实践启发的系统性多智能体算法问题求解工作流。我们的框架将端到端求解过程分解为模块化阶段,实现了智能体间的结构化推理、工具集成和灵活协调。该设计强调严谨性与可扩展性,使其能够泛化至多种问题类型。在自建基准上的实验结果表明,多个Qwen系列模型均获得一致提升,平均接受率提高6.48%。相比之下,基于相同数据的参数高效微调仅带来0.89%的边际改进。我们进一步在LiveCodeBench-Pro上观察到4.72%的提升,同时准确率和效率指标均有全面提高。除性能提升外,我们开展了全面分析以更好理解工作流中的推理过程,包括错误模式和跨场景行为。我们还进行了定制化替换和消融研究以探索框架上限,结果显示单个智能体可带来高达27.7%的性能改进。这些结果凸显了MAS-Algorithm在推动AI驱动算法推理方面的巨大潜力。