While Large Language Models (LLMs) have catalyzed breakthroughs in automated code generation, Small Language Models (SLMs) often encounter reasoning bottlenecks and failure loops when addressing complex logical requirements. To overcome these challenges, we propose DebateCoder, a multi-agent collaborative framework designed to improve the reasoning ability of SLMs (e.g., Pangu-1B) in resource-constrained environments. DebateCoder uses a structured role-playing protocol with three agents: User Agent (A_UA), Technical Agent (A_TA), and Quality Assurance Agent (A_QA). It also includes an Adaptive Confidence Gating mechanism with a 95% threshold to balance accuracy and inference efficiency. In addition, we introduce a multi-turn deliberation module and a reviewer-guided analytical debugging loop for orthogonal pre-generation debate and post-generation refinement. Experiments on HumanEval and MBPP show that DebateCoder achieves 70.12% Pass@1 on HumanEval, outperforming MapCoder while reducing API overhead by about 35%. These results indicate that collaborative protocols can mitigate limitations of small-parameter models and provide a scalable, efficient approach to high-quality automated software engineering.
翻译:尽管大型语言模型(LLM)在自动化代码生成领域取得了突破性进展,小型语言模型(SLM)在处理复杂逻辑需求时仍常遭遇推理瓶颈与失败循环。为应对这些挑战,我们提出了DebateCoder——一种专为资源受限环境设计的多智能体协作框架,旨在提升SLM(例如Pangu-1B)的推理能力。DebateCoder采用结构化角色扮演协议,包含三个智能体:用户代理(A_UA)、技术代理(A_TA)与质量保证代理(A_QA)。该框架还引入了置信阈值为95%的自适应置信门控机制,以平衡准确性与推理效率。此外,我们设计了多轮审议模块和评审引导的分析调试循环,分别用于正交的生成前辩论与生成后优化。在HumanEval和MBPP基准上的实验表明,DebateCoder在HumanEval上实现了70.12%的Pass@1得分,在超越MapCoder性能的同时减少了约35%的API开销。这些结果证明,协作协议能够缓解小参数模型的局限性,并为高质量自动化软件工程提供可扩展的高效解决方案。