Large language models (LLMs) have shown great progress in responding to user questions, allowing for a multitude of diverse applications. Yet, the quality of LLM outputs heavily depends on the prompt design, where a good prompt might enable the LLM to answer a very challenging question correctly. Therefore, recent works have developed many strategies for improving the prompt, including both manual crafting and in-domain optimization. However, their efficacy in unrestricted scenarios remains questionable, as the former depends on human design for specific questions and the latter usually generalizes poorly to unseen scenarios. To address these problems, we give LLMs the freedom to design the best prompts according to themselves. Specifically, we include a hierarchy of LLMs, first constructing a prompt with precise instructions and accurate wording in a hierarchical manner, and then using this prompt to generate the final answer to the user query. We term this pipeline Hierarchical Multi-Agent Workflow, or HMAW. In contrast with prior works, HMAW imposes no human restriction and requires no training, and is completely task-agnostic while capable of adjusting to the nuances of the underlying task. Through both quantitative and qualitative experiments across multiple benchmarks, we verify that despite its simplicity, the proposed approach can create detailed and suitable prompts, further boosting the performance of current LLMs.
翻译:大型语言模型(LLM)在响应用户问题方面已展现出显著进展,支持了多种多样的应用。然而,LLM的输出质量在很大程度上取决于提示设计,一个好的提示可能使LLM能够正确回答极具挑战性的问题。因此,近期研究提出了许多改进提示的策略,包括人工设计和领域内优化。但它们在无限制场景中的有效性仍存疑,因为前者依赖于针对特定问题的人工设计,而后者通常对未见场景的泛化能力较差。为解决这些问题,我们赋予LLM根据自身能力设计最佳提示的自由。具体而言,我们引入一个LLM分层结构,首先以分层方式构建具有精确指令和准确措辞的提示,然后使用该提示生成用户查询的最终答案。我们将此流程称为分层多智能体工作流(Hierarchical Multi-Agent Workflow, HMAW)。与先前工作相比,HMAW不施加人为限制、无需训练,完全与任务无关,同时能够适应底层任务的细微差别。通过在多个基准测试上的定量与定性实验,我们验证了尽管方法简单,所提出的方法能够创建详细且合适的提示,从而进一步提升当前LLM的性能。