Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models

Xudong Wang,Chaoning Zhang,Chenghao Li,Shuxu Chen,Qigan Sun,Jiaquan Zhang,Fachrina Dewi Puspitasari,Tae-Ho Kim,Jiwei Wei,Malu Zhang,Guoqing Wang,Yang Yang,Heng Tao Shen

from arxiv, Accepted to ACL 2026. 9 pages, 5 figures

Large Language Models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, while recent prompting strategies such as Chain-of-Thought (CoT) have further elevated their performance in handling complex logical problems. Despite these advances, high-quality reasoning remains heavily reliant on manual static prompts and is sensitive to decoding configurations and task distributions, leading to performance fluctuations and limited transferability. Existing automatic prompt optimization methods typically adopt single-agent local search, failing to simultaneously optimize prompts and decoding hyperparameters within a unified framework to achieve stable global improvements. To address this limitation, we propose Agent-GWO, a dynamic prompt optimization framework for complex reasoning. Specifically, we unify prompt templates and decoding hyperparameters as inheritable agent configurations. By leveraging the leader-follower mechanism of the Grey Wolf Optimizer (GWO), we automatically select three leader agents ($α$, $β$, and $δ$) to guide the collaborative updates of the remaining agents, enabling iterative convergence toward robust optimal reasoning configurations that can be seamlessly integrated for inference. Extensive experiments on multiple mathematical and hybrid reasoning benchmarks across diverse LLM backbones show that Agent-GWO consistently improves accuracy and stability over existing prompt optimization methods. The code will be released publicly.

翻译：摘要：大语言模型（LLM）在复杂推理任务中展现出强大能力，而思维链（Chain-of-Thought, CoT）等近期提示策略进一步提升了其处理复杂逻辑问题的性能。尽管取得这些进展，高质量推理仍高度依赖人工静态提示，并对解码配置与任务分布具有敏感性，导致性能波动且可迁移性受限。现有自动提示优化方法通常采用单智能体局部搜索，无法在统一框架内同时优化提示与解码超参数以实现稳定的全局改进。针对该局限，我们提出Agent-GWO——一种面向复杂推理的动态提示优化框架。具体而言，我们将提示模板与解码超参数统一建模为可继承的智能体配置。通过利用灰狼优化器（Grey Wolf Optimizer, GWO）的领导者-跟随者机制，自动选取三个领导者智能体（$α$、$β$ 和 $δ$）指导其余智能体的协作更新，从而迭代收敛至可无缝集成用于推理的鲁棒最优推理配置。在多种LLM主干网络上的数学与混合推理基准测试中开展的大量实验表明，Agent-GWO在准确性和稳定性上持续超越现有提示优化方法。相关代码将公开发布。