Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs-achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve's reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework. We open-source optimize\_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa .
翻译:能否用单一基于LLM的优化系统,在截然不同的领域匹配专用工具?我们证明,当优化问题被表述为改进由评分函数评估的文本工件时,一个支持单任务搜索、具有跨问题迁移能力的多任务搜索以及面向未见输入的泛化的单一AI优化系统,能在六个不同任务上取得最先进的结果。我们的系统发现可将Gemini Flash在ARC-AGI上的准确率从32.5%提升至89.5%的智能体架构,找到将云成本削减40%的调度算法,生成87%匹配或超越PyTorch的CUDA内核,并超越AlphaEvolve报告的圆填充方案(n=26)。在三个领域的消融实验表明,可操作的侧面信息相比仅反馈评分能带来更快的收敛速度和显著更高的最终得分;在给定等量每问题预算下,多任务搜索通过跨任务迁移优于独立优化,其收益随相关任务数量增加而扩大。综合而言,我们首次证明基于LLM搜索的文本优化是一种通用问题求解范式,能将传统上需要特定领域算法的任务统一到单一框架下。我们开源了optimize_anything(支持多后端),作为GEPA项目的一部分,代码见https://github.com/gepa-ai/gepa。