Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Recent advancements in large language models (LLMs) have significantly enhanced the ability of LLM-based systems to perform complex tasks through natural language processing and tool interaction. However, optimizing these LLM-based systems for specific tasks remains challenging, often requiring manual interventions like prompt engineering and hyperparameter tuning. Existing automatic optimization methods, such as textual feedback-based techniques (e.g., TextGrad), tend to focus on immediate feedback, analogous to using immediate derivatives in traditional numerical gradient descent. However, relying solely on such feedback can be limited when the adjustments made in response to this feedback are either too small or fluctuate irregularly, potentially slowing down or even stalling the optimization process. To overcome these challenges, more adaptive methods are needed, especially in situations where the system's response is evolving slowly or unpredictably. In this paper, we introduce REVOLVE, an optimization method that tracks how "R"esponses "EVOLVE" across iterations in LLM systems. By focusing on the evolution of responses over time, REVOLVE enables more stable and effective optimization by making thoughtful, progressive adjustments at each step. Experimental results demonstrate that REVOLVE outperforms competitive baselines, achieving a 7.8% improvement in prompt optimization, a 20.72% gain in solution refinement, and a 29.17% increase in code optimization. Additionally, REVOLVE converges in fewer iterations, resulting in significant computational savings. These advantages highlight its adaptability and efficiency, positioning REVOLVE as a valuable tool for optimizing LLM-based systems and accelerating the development of next-generation AI technologies. Code is available at: https://github.com/Peiyance/REVOLVE.

翻译：近年来，大型语言模型（LLM）的进步显著增强了基于LLM的系统通过自然语言处理和工具交互执行复杂任务的能力。然而，针对特定任务优化这些基于LLM的系统仍然具有挑战性，通常需要人工干预，如提示工程和超参数调优。现有的自动优化方法，例如基于文本反馈的技术（如TextGrad），往往侧重于即时反馈，类似于在传统数值梯度下降中使用即时导数。然而，当针对此类反馈所做的调整过小或不规则波动时，仅依赖此类反馈可能会受到限制，从而可能减缓甚至停滞优化过程。为了克服这些挑战，尤其是在系统响应演化缓慢或不可预测的情况下，需要更具适应性的方法。本文中，我们介绍了REVOLVE，一种追踪LLM系统中响应在迭代间如何“演化”的优化方法。通过关注响应随时间的演化，REVOLVE能够在每一步进行深思熟虑的渐进式调整，从而实现更稳定和有效的优化。实验结果表明，REVOLVE优于竞争基线，在提示优化上实现了7.8%的提升，在解决方案精炼上获得了20.72%的增益，在代码优化上取得了29.17%的增长。此外，REVOLVE以更少的迭代次数收敛，从而显著节省了计算成本。这些优势凸显了其适应性和效率，使REVOLVE成为优化基于LLM的系统和加速下一代人工智能技术发展的宝贵工具。代码发布于：https://github.com/Peiyance/REVOLVE。